TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Morphology Rivals Supervised Morphology for Arabic MT"

If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. | Unsupervised Morphology Rivals Supervised Morphology for Arabic MT David Stallard Jacob Devlin Michael Kayser BBN Technologies stallard jdevlin rzbib @ Yoong Keok Lee Regina Barzilay CSAIL Massachusetts Institute of Technology yklee regina @ Abstract If unsupervised morphological analyzers could approach the effectiveness of supervised ones they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers using a state-of-the-art Arabic-to-English MT system. We apply maximum marginal decoding to the unsupervised analyzer and show that this yields the best published segmentation accuracy for Arabic while also making segmentation output more stable. Our approach gives an 18 relative BLEU gain for Levantine dialectal Arabic. Furthermore it gives higher gains for Modern Standard Arabic MSA as measured on NIST MT-08 than does MADA Habash and Rambow 2005 a leading supervised MSA segmenter. 1 Introduction If unsupervised morphological segmenters could approach the effectiveness of supervised ones they would be a very attractive choice for improving machine translation MT performance in low-resource inflected languages. An example of particular current interest is Arabic whose various colloquial dialects are sufficiently different from Modern Standard Arabic MSA in lexicon orthography and morphology as to be low-resource languages themselves. An additional advantage of Arabic for study is the availability of high-quality supervised seg-menters for MSA such as MADA Habash and 322 Rambow 2005 for performance comparison. The MT gain for supervised MSA segmenters on dialect establishes a lower bound which the unsupervised segmenter must exceed if it is to be useful for dialect. And comparing the gain for supervised and unsupervised segmenters on MSA tells us how useful the unsupervised segmenter is .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.