TAILIEUCHUNG - Báo cáo khoa học: "Sub-Sentence Division for Tree-Based Machine Translation"

Tree-based statistical machine translation models have made significant progress in recent years, especially when replacing 1-best trees with packed forests. However, as the parsing accuracy usually goes down dramatically with the increase of sentence length, translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence division that reduces the decoding time and improves the translation quality for tree-based translation. Our approach divides long sentences into several sub-sentences by exploiting tree structures. Large-scale experiments on the NIST 2008 Chinese-toEnglish test set show that our approach achieves an absolute improvement of . | Sub-Sentence Division for Tree-Based Machine Translation Hao Xiong Wenwen Xu Haitao Mi Yang Liu and Qun Liu Key Lab. of Intelligent Information Processing Key Lab. of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences . Box 2704 Beijing 100190 China xionghao xuwenwen htmi yliu liuqun @ Abstract Tree-based statistical machine translation models have made significant progress in recent years especially when replacing 1-best trees with packed forests. However as the parsing accuracy usually goes down dramatically with the increase of sentence length translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence division that reduces the decoding time and improves the translation quality for tree-based translation. Our approach divides long sentences into several sub-sentences by exploiting tree structures. Large-scale experiments on the NIST 2008 Chinese-to-English test set show that our approach achieves an absolute improvement of BLEU points over the baseline system in 50 less time. 1 Introduction Tree-based statistical machine translation models in days have witness promising progress in recent years such as tree-to-string models Liu et al. 2006 Huang et al. 2006 tree-to-tree models Quirk et al. 2005 Zhang et al. 2008 . Especially when incorporated with forest the correspondent forest-based tree-to-string models Mi et al. 2008 Zhang et al. 2009 tree-to-tree models Liu et al. 2009 have achieved a promising improvements over correspondent treebased systems. However when we translate long sentences we argue that two major issues will be raised. On one hand parsing accuracy will be lower as the length of sentence grows. It will inevitably hurt the translation quality Quirk and Corston-Oliver 2006 Mi and Huang 2008 . On the other hand decoding on long sentences will be time consuming especially for forest approaches. So splitting long .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.