TAILIEUCHUNG - Báo cáo khoa học: "Machine Translation System Combination by Confusion Forest"

The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. | Machine Translation System Combination by Confusion Forest Taro Watanabe and Eiichiro Sumita National Institute of Information and Communications Technology 3-5 Hikaridai Keihanna Science City 619-0289 JAPAN @ Abstract The state-of-the-art system combination method for machine translation MT is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses First MT outputs are parsed. Second a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space. 1 Introduction System combination techniques take the advantages of consensus among multiple systems and have been widely used in fields such as speech recognition Fiscus 1997 Mangu et al. 2000 or parsing Henderson and Brill 1999 . One of the state-of-the-art system combination methods for MT is based on confusion networks which are compact graph-based structures representing multiple hypotheses Bangalore et al. 2001 . Confusion networks are constructed based on string similarity information. First one skeleton or 1249 backbone sentence is selected. Then other hypotheses are aligned against the skeleton forming a lattice with each arc representing alternative word candidates. The alignment method is either model-based Matusov et al. 2006 He et al. 2008 in which a statistical word aligner is used to compute hypothesis alignment or

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.