TAILIEUCHUNG - Báo cáo khoa học: "POS Disambiguation and Unknown Word Guessing with Decision Trees"

This paper presents a decision-tree approach to the problems of part-ofspeech disambiguation and unknown word guessing as they appear in Modem Greek, a highly inflectional language. The learning procedure is tag-set independent and reflects the linguistic reasoning on the specific problems. The decision trees induced are combined with a highcoverage lexicon to form a tagger that achieves 93,5% overall disambiguation accuracy. | Proceedings of EACL 99 POS Disambiguation and Unknown Word Guessing with Decision Trees Giorgos s. Orphanos Computer Engineering Informatics Dept and Computer Technology Institute University of Patras 26500 Rion Patras Greece georfan@ Abstract This paper presents a decision-tree approach to the problems of part-of-speech disambiguation and unknown word guessing as they appear in Modem Greek a highly inflectional language. The learning procedure is tag-set independent and reflects the linguistic reasoning on the specific problems. The decision frees induced are combined with a high-coverage lexicon to form a tagger that achieves 93 5 overall disambiguation accuracy. 1 Introduction Part-of-speech POS taggers are software devices that aim to assign unambiguous morphosyntactic tags to words of electronic texts. Although the hardest part of the tagging process is performed by a computational lexicon a POS tagger cannot solely consist of a lexicon due to i morphosyntactic ambiguity . love as verb or noun and ii the existence of unknown words . proper nouns place names compounds etc. . When the lexicon can assure high coverage unknown word guessing can be viewed as a decision taken upon the POS of open-class words . Noun Verb Adjective Adverb or Participle . Towards the disambiguation of POS tags two main approaches have been followed. On one hand according to the linguistic approach experts encode handcrafted rules or constraints based on abstractions derived from language paradigms usually with the aid of corpora Green and Rubin 1971 Voutilainen 1995 . On the other hand according to the data-driven Dimitris N. Christodoulakis Computer Engineering Informatics Dept and Computer Technology Institute University of Patras 26500 Rion Patras Greece dxri@ approach a frequency-based language model is acquired from corpora and has the forms of ngrams Church 1988 Cutting et al. 1992 rules Kindle 1989 Brill 1995 decision frees Cardie 1994 Daelemans et al. 1996

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.