TAILIEUCHUNG - Báo cáo khoa học: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules"

We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling (SRL) on one side of the language pair, we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models, which can be used directly in translation tasks or combined with baseline models. . | Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules Qin Gao and Stephan Vogel Language Technologies Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 qing @ Abstract We present an approach of expanding parallel corpora for machine translation. By utilizing Semantic role labeling SRL on one side of the language pair we extract SRL substitution rules from existing parallel corpus. The rules are then used for generating new sentence pairs. An SVM classifier is built to filter the generated sentence pairs. The filtered corpus is used for training phrase-based translation models which can be used directly in translation tasks or combined with baseline models. Experimental results on Chinese-English machine translation tasks show an average improvement of BLEU and TER points across 5 different NIST test sets. 1 Introduction Statistical machine translation SMT relies on parallel corpus. Aside from collecting parallel corpus we have seen interesting research on automatically generating corpus from existing resources. Typical examples are paraphrasing using bilingual Callison-Burch et al. 2006 or monolingual Quirk et al. 2004 data. In this paper we propose a different methodology of generating additional parallel corpus. The basic idea of paraphrasing is to find alternative ways that convey the same information. In contrast we propose to build new parallel sentences that convey different information yet retain correct grammatical and semantic structures. The basic idea of the proposed method is to substitute source and target phrase pairs in a sentence pair with phrase pairs from other sentences. The problem is how to identify where a substitution should happen and which phrase pairs are valid candidates for the substitution. While syntactical constraints have been proven to helpful in identifying 294 good paraphrases Callison-Burch 2008 it is insufficient in our task .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.