TAILIEUCHUNG - Báo cáo khoa học: "Confidence Measure for Word Alignment"

In this paper we present a confidence measure for word alignment based on the posterior probability of alignment links. We introduce sentence alignment confidence measure and alignment link confidence measure. Based on these measures, we improve the alignment quality by selecting high confidence sentence alignments and alignment links from multiple word alignments of the same sentence pair. Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. . | Confidence Measure for Word Alignment Fei Huang IBM Research Center Yorktown Heights NY 10598 USA huangfe@ Abstract In this paper we present a confidence measure for word alignment based on the posterior probability of alignment links. We introduce sentence alignment confidence measure and alignment link confidence measure. Based on these measures we improve the alignment quality by selecting high confidence sentence alignments and alignment links from multiple word alignments of the same sentence pair. Additionally we remove low confidence alignment links from the word alignment of a bilingual training corpus which increases the alignment F-score improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. 1 Introduction Data-driven approaches have been quite active in recent machine translation MT research. Many MT systems such as statistical phrase-based and syntax-based systems learn phrase translation pairs or translation rules from large amount of bilingual data with word alignment. The quality of the parallel data and the word alignment have significant impacts on the learned translation models and ultimately the quality of translation output. Due to the high cost of commissioned translation many parallel sentences are automatically extracted from comparable corpora which inevitably introduce many noises . inaccurate or non-literal translations. Given the huge amount of bilingual training data word alignments are automatically generated using various algorithms Brown et al. 1994 Vogel et al. 1996 Figure 1 An example of inaccurate translation and word alignment. and Ittycheriah and Roukos 2005 which also introduce many word alignment errors. The example in Figure 1 shows the word alignment of the given Chinese and English sentence pair where the English words following each Chinese word is its literal translation. We find untranslated Chinese and English words marked with

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.