TAILIEUCHUNG - Báo cáo khoa học: "An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation"

Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways. | An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation Meni Adler Department of Computer Science Ben Gurion University of the Negev 84105 Beer Sheva Israel adlerm@ Abstract Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous there are several possible analyses for the word a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language which combines morphemes into a word in both agglutinative and fusional ways. We present an unsupervised stochastic model - the only resource we use is a morphological analyzer -which deals with the data sparseness problem caused by the affixational morphology of the Hebrew language. We present a text encoding method for languages with affixational morphology in which the knowledge of word formation rules which are quite restricted in Hebrew helps in the disambiguation. We adapt HMM algorithms for learning and searching this text representation in such a way that segmentation and tagging can be learned in parallel in one step. Results on a large scale evaluation indicate that this learning improves disambiguation for complex tag sets. Our method is applicable to other languages with affix morphology. 1 Introduction Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text according to the word context. In this work we investigate morphological disambiguation in Modern Hebrew. We explore unsupervised learning method which is more challenging than the supervised case. The main motivation for this approach is that despite the development This work is supported by the Lynn and William Frankel Center for Computer Sciences and by the Knowledge Center for Hebrew Processing Israel Science Ministry. Michael Elhadad Department of Computer Science Ben Gurion University

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.