TAILIEUCHUNG - Báo cáo khoa học: "Using POS Information for Statistical Machine Translation into Morphologically Rich Languages"

When translating from languages with hardly any inflectional morphology like English into morphologically rich languages, the English word forms often do not contain enough information for producing the correct fullform in the target language. We investigate methods for improving the quality of such translations by making use of part-ofspeech information and maximum entropy modeling. Results for translations from English into Spanish and Catalan are presented on the LC-STAR corpus which consists of spontaneously spoken dialogues in the domain of appointment scheduling and travel planning. . | Using POS Information for Statistical Machine Translation into Morphologically Rich Languages Nicola Ueffing and Hermann Ney Lehrstuhl fur Informatik VI - Computer Science Department RWTH Aachen - University of Technology ueffing ney @ Abstract When translating from languages with hardly any inflectional morphology like English into morphologically rich languages the English word forms often do not contain enough information for producing the correct fullform in the target language. We investigate methods for improving the quality of such translations by making use of part-of-speech information and maximum entropy modeling. Results for translations from English into Spanish and Catalan are presented on the LC-STAR corpus which consists of spontaneously spoken dialogues in the domain of appointment scheduling and travel planning. 1 Introduction In this paper we address the question of how part-of-speech POS information can help improving the quality of Statistical Machine Translation SMT . One of the main problems when translating from a language with hardly any inflectional morphology which is English in our experiments into one with richer morphology here Spanish and Catalan is the production of the correct inflected form in the target language. We introduce transformations to the English string that are based on the part-of-speech information and show how this knowledge source can help SMT. Systematic evaluations will show that the quality of the gen erated translations is improved. The transformations we apply are the following Treatment of verbs In Catalan and Spanish the pronoun before a verb is often omitted and instead the person is expressed via the ending of the verb. The same holds for future tense and for the modes expressed through would and should in English. Since this makes it hard to generate the correct translation of a given English verb we propose a method resulting in English word forms containing sufficient information. .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.