TAILIEUCHUNG - Báo cáo khoa học: "Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment"

We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. | Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment Marine Carpuat Yuval Marton Nizar Habash Columbia University Center for Computational Learning Systems 475 Riverside Drive New York NY 10115 marine ymarton habash @ Abstract We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation SMT . We show that post-verbal subject VS constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores even on a strong large-scale baseline and despite noisy parses. 1 Introduction Modern Standard Arabic MSA is a morpho-syntactically complex language with different phenomena from English a fact that raises many interesting issues for natural language processing and Arabic-to-English statistical machine translation SMT . While comprehensive Arabic preprocessing schemes have been widely adopted for handling Arabic morphology in SMT . Sadat and Habash 2006 Zollmann et al. 2006 Lee 2004 syntactic issues have not received as much attention by comparison Green et al. 2009 Crego and Habash 2008 Habash 2007 . Arabic verbal constructions are particularly challenging since subjects can occur in pre-verbal SV post-verbal VS or pro-dropped null subject constructions. As a result training data for learning verbal construction translations is split between the different constructions and their patterns and complex reordering schemas are needed in order to translate them into primarily pre-verbal subject languages SVO such as English. These issues are particularly problematic in .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.