TAILIEUCHUNG - Báo cáo khoa học: "An Unsupervised Model for Joint Phrase Alignment and Extraction"

We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. . | An Unsupervised Model for Joint Phrase Alignment and Extraction Graham Neubig1 2 Taro Watanabe2 Eiichiro Sumita2 Shinsuke Mori1 Tatsuya Kawahara1 Graduate School of Informatics Kyoto University Yoshida Honmachi Sakyo-ku Kyoto Japan 2National Institute of Information and Communication Technology 3-5 Hikari-dai Seika-cho Soraku-gun Kyoto Japan Abstract We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars ITGs . The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment phrase extraction approach while reducing the phrase table to a fraction of the original size. 1 Introduction The training of translation models for phrasebased statistical machine translation SMT systems Koehn et al. 2003 takes unaligned bilingual training data as input and outputs a scored table of phrase pairs. This phrase table is traditionally generated by going through a pipeline of two steps first generating word or minimal phrase alignments then extracting a phrase table that is consistent with these alignments. However as DeNero and Klein 2010 note this two step approach results in word alignments that are not optimal for the final task of generating 632 phrase tables that are used in translation. As a solution to this they proposed a supervised discriminative model that performs joint word alignment and phrase extraction and found that joint estimation of word alignments and extraction sets improves both word .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.