TAILIEUCHUNG - Báo cáo khoa học: "Empirical Methods for Compound Splitting"

Compounded words are a challenge for NLP applications such as machine translation (MT). We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of and performance gains for MT of BLEU on a German-English noun phrase translation task. | Empirical Methods for Compound Splitting Philipp Koehn Information Sciences Institute Department of Computer Science University of Southern California koehn@ Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California knight@ Abstract Compounded words are a challenge for NLP applications such as machine translation MT . We introduce methods to learn splitting rules from monolingual and parallel corpora. We evaluate them against a gold standard and measure their impact on performance of statistical MT systems. Results show accuracy of and performance gains for MT of BLEU on a German-English noun phrase translation task. Figure 1 Splitting options for the German word Aktionsplan 1 Introduction Compounding of words is common in a number of languages German Dutch Finnish Greek etc. . Since words may be joined freely this vastly increases the vocabulary size leading to sparse data problems. This poses challenges for a number of NLP applications such as machine translation speech recognition text classification information extraction or information retrieval. For machine translation the splitting of an unknown compound into its parts enables the translation of the compound by the translation of its parts. Take the word Aktionsplan in German see Figure 1 which was created by joining the words Ak-tion and Plan. Breaking up this compound would assist the translation into English as action plan. Compound splitting is a well defined computational linguistics task. One way to define the goal of compound splitting is to break up foreign words so that a one-to-one correspondence to English can be established. Note that we are looking for a one-to-one correspondence to English content words Say the preferred translation of Ak-tionsplan is plan for action. The lack of correspondence for the English word or does not detract from the definition of the task We would still like to break up the German compound .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.