Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. | Unsupervised Morphology Rivals Supervised Morphology for Arabic MT David Stallard Jacob Devlin Michael Kayser BBN Technologies stallard jdevlin rzbib @bbn.com Yoong Keok Lee Regina Barzilay CSAIL Massachusetts Institute of Technology yklee regina @csail.mit.edu Abstract If unsupervised morphological analyzers could approach the effectiveness of supervised ones they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers using a state-of-the-art Arabic-to-English MT system. We apply maximum marginal decoding to the unsupervised analyzer and show that this yields the best published segmentation accuracy for Arabic while also making segmentation output more stable. Our approach gives an 18 relative BLEU gain for Levantine dialectal Arabic. Furthermore it gives higher gains for Modern Standard Arabic MSA as measured on NIST MT-08 than does MADA Habash and Rambow 2005 a leading supervised MSA segmenter. 1 Introduction If unsupervised morphological segmenters could approach the effectiveness of supervised ones they would be a very attractive choice for improving machine translation MT performance in low-resource inflected languages. An example of particular current interest is Arabic whose various colloquial dialects are sufficiently different from Modern Standard Arabic MSA in lexicon orthography and morphology as to be low-resource languages themselves. An additional advantage of Arabic for study is the availability of high-quality supervised seg-menters for MSA such as MADA Habash and 322 Rambow 2005 for performance comparison. The MT gain for supervised MSA segmenters on dialect establishes a lower bound which the unsupervised segmenter must exceed if it is to be useful for dialect. And comparing the gain for supervised and unsupervised segmenters on MSA tells us how useful the unsupervised segmenter is .