TAILIEUCHUNG - Báo cáo khoa học: "Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages"

In this thesis proposal I present my thesis work, about pre- and postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. | Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages Sara Stymne Department of Computer and Information Science Linkoping University Linkoping Sweden Abstract In this thesis proposal I present my thesis work about pre- and postprocessing for statistical machine translation mainly into Germanic languages. I focus my work on four areas compounding definite noun phrases reordering and error correction. Initial results are positive within all four areas and there are promising possibilities for extending these approaches. In addition I also focus on methods for performing thorough error analysis of machine translation output which can both motivate and evaluate the studies performed. 1 Introduction Statistical machine translation SMT is based on training statistical models from large corpora of human translations. It has the advantage that it is very fast to train if there are available corpora compared to rule-based systems and SMT systems are often relatively good at lexical disambiguation. A large drawback of SMT systems is that they use no or little grammatical knowledge relying mainly on a target language model for producing correct target language texts often resulting in ungrammatical output. Thus methods to include some possibly shallow linguistic knowledge seem reasonable. The main focus for SMT to date has been on translation into English for which the models work relatively well especially for source languages that are structurally similar to English. There has been less research on translation out of English or between other language pairs. Methods that are useful for translation into English have problems in many cases for instance for translation into morphologically rich languages. Word order differences and 12 morphological complexity of a language have been shown to be explanatory variables for the performance of phrase-based SMT systems Birch et al. 2008 . German and the Scandinavian languages are a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.