TAILIEUCHUNG - Báo cáo khoa học: "Segmentation for English-to-Arabic Statistical Machine Translation"

In this paper, we report on a set of initial results for English-to-Arabic Statistical Machine Translation (SMT). We show that morphological decomposition of the Arabic source is beneficial, especially for smaller-size corpora, and investigate different recombination techniques. We also report on the use of Factored Translation Models for Englishto-Arabic translation. | Segmentation for English-to-Arabic Statistical Machine Translation Ibrahim Badr Rabih Zbib James Glass Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge MA 02139 USA iab02 rabih glass @ Abstract In this paper we report on a set of initial results for English-to-Arabic Statistical Machine Translation SMT . We show that morphological decomposition of the Arabic source is beneficial especially for smaller-size corpora and investigate different recombination techniques. We also report on the use of Factored Translation Models for English-to-Arabic translation. 1 Introduction Arabic has a complex morphology compared to English. Words are inflected for gender number and sometimes grammatical case and various clitics can attach to word stems. An Arabic corpus will therefore have more surface forms than an English corpus of the same size and will also be more sparsely populated. These factors adversely affect the performance of Arabic English Statistical Machine Translation SMT . In prior work Lee 2004 Habash and Sadat 2006 it has been shown that morphological segmentation of the Arabic source benefits the performance of Arabic-to-English SMT. The use of similar techniques for English-to-Arabic SMT requires recombination of the target side into valid surface forms which is not a trivial task. In this paper we present an initial set of experiments on English-to-Arabic SMT. We report results from two domains text news trained on a large corpus and spoken travel conversation trained on a significantly smaller corpus. We show that segmenting the Arabic target in training and decoding improves performance. We propose various schemes for recombining the segmented Arabic and compare their effect on translation. We also report on applying Factored Translation Models Koehn and Hoang 2007 for English-to-Arabic translation. 2 Previous Work The only previous work on English-to-Arabic SMT that we are aware of is by Sarikaya .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.