TAILIEUCHUNG - Báo cáo khoa học: "Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop"

We present an approach to using a morphological analyzer for tokenizing and morphologically tagging (including partof-speech tagging) Arabic words in one process. We learn classifiers for individual morphological features, as well as ways of using these classifiers to choose among entries from the output of the analyzer. We obtain accuracy rates on all tasks in the high nineties. | Arabic Tokenization Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning Systems Columbia University New York NY 10115 UsA habash rambow @ Abstract We present an approach to using a morphological analyzer for tokenizing and morphologically tagging including part-of-speech tagging Arabic words in one process. We learn classifiers for individual morphological features as well as ways of using these classifiers to choose among entries from the output of the analyzer. We obtain accuracy rates on all tasks in the high nineties. 1 Introduction Arabic is a morphologically complex The morphological analysis of a word consists of determining the values of a large number of orthogonal features such as basic part-of-speech . noun verb and so on voice gender number information about the clitics and so For Arabic this gives us about 333 000 theoretically possible completely specified morphological analyses . morphological tags of which about 2 200 are actually used in the first 280 000 words of the Penn Arabic Treebank ATB . In contrast English morphological tagsets usually have about 50 tags which cover all morphological variation. As a consequence morphological disambiguation of a word in context . choosing a complete 1We would like to thank Mona Diab for helpful discussions. The work reported in this paper was supported by NSF Award 0329163. The authors are listed in alphabetical order. 2In this paper we only discuss inflectional morphology. Thus the fact that the stem is composed of a root a pattern and an in x vocalism is not relevant except as it affects broken plurals and verb aspect. morphological tag cannot be done successfully using methods developed for English because of data sparseness. Hajic 2000 demonstrates convincingly that morphological disambiguation can be aided by a morphological analyzer which given a word without any context gives us

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.