TAILIEUCHUNG - Báo cáo khoa học: "Tree-to-String Alignment Template for Statistical Machine Translation"

We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. . | Tree-to-String Alignment Template for Statistical Machine Translation Yang Liu Qun Liu and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences Kexueyuan South Road Haidian District P O. BoX 2704 Beijing 100080 China yliu liuqun sxlin @ Abstract We present a novel translation model based on tree-to-string alignment template TAT which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned source side parsed parallel texts. To translate a source sentence we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh a state-of-the-art decoder for phrase-based models. 1 Introduction Phrase-based translation models Marcu and Wong 2002 Koehn et al. 2003 Och and Ney 2004 which go beyond the original IBM translation models Brown et al. 1993 1 by modeling translations of phrases rather than individual words have been suggested to be the state-of-the-art in statistical machine translation by empirical evaluations. In phrase-based models phrases are usually strings of adjacent words instead of syntactic constituents excelling at capturing local reordering and performing translations that are localized to 1The mathematical notation we use in this paper is taken from that paper a source string fl fl . fj . fj is to be translated into a target string el el . ei . ei. Here I is the length of the target string and J is the length of the source string. substrings that are common enough to be observed on training data. However a key limitation of phrase-based models is that they fail to model reordering at the phrase level robustly. Typically phrase reordering is modeled in terms .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.