TAILIEUCHUNG - Báo cáo khoa học: "Structural and Topical Dimensions in Multi-Task Patent Translation"

Patent translation is a complex problem due to the highly specialized technical vocabulary and the peculiar textual structure of patent documents. In this paper we analyze patents along the orthogonal dimensions of topic and textual structure. We view different patent classes and different patent text sections such as title, abstract, and claims, as separate translation tasks, and investigate the influence of such tasks on machine translation performance. We study multitask learning techniques that exploit commonalities between tasks by mixtures of translation models or by multi-task metaparameter tuning. We find small but significant gains over task-specific training by techniques that. | Structural and Topical Dimensions in Multi-Task Patent Translation Katharina Waschle and Stefan Riezler Department of Computational Linguistics Heidelberg University Germany waeschle riezler @ Abstract Patent translation is a complex problem due to the highly specialized technical vocabulary and the peculiar textual structure of patent documents. In this paper we analyze patents along the orthogonal dimensions of topic and textual structure. We view different patent classes and different patent text sections such as title abstract and claims as separate translation tasks and investigate the influence of such tasks on machine translation performance. We study multitask learning techniques that exploit commonalities between tasks by mixtures of translation models or by multi-task metaparameter tuning. We find small but significant gains over task-specific training by techniques that model commonalities through shared parameters. A by-product of our work is a parallel patent corpus of 23 million German-English sentence pairs. 1 Introduction Patents are an important tool for the protection of intellectual property and also play a significant role in business strategies in modern economies. Patent translation is an enabling technique for patent prior art search which aims to detect a patent s novelty and thus needs to be cross-lingual for a multitude of languages. Patent translation is complicated by a highly specialized vocabulary consisting of technical terms specific to the field of invention the patent relates to. Patents are written in a sophisticated legal jargon patentese that is not found in everyday language and exhibits a complex textual structure. Also patents are often intentionally ambiguous or vague in order to maximize the coverage of the claims. In this paper we analyze patents with respect to the orthogonal dimensions of topic - the technical field covered by the patent - and structure - a patent s text sections - with respect to .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.