TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Discovery of Persian Morphemes"

On the other hand, the construction of a comprehensive morphological analyzer for a language based on linguistic theory requires a considerable amount of work by experts. This is both slow and expensive and therefore not applicable to all languages. Consequently, it is important to develop methods that are able to discover and induce morphology for a language based on unsupervised analysis of large amounts of data. | Unsupervised Discovery of Persian Morphemes Mohsen Arabsorkhi Computer Science and Engineering Dept. Shiraz University Shiraz Iran marabsorkhi@ Mehrnoush Shamsfard Electrical and Computer Engineering Dept. Shahid Beheshti University Tehran Iran m-shams@ Abstract This paper reports the present results of a research on unsupervised Persian morpheme discovery. In this paper we present a method for discovering the morphemes of Persian language through automatic analysis of corpora. We utilized a Minimum Description Length MDL based algorithm with some improvements and applied it to Persian corpus. Our improvements include enhancing the cost function using some heuristics preventing the split of high frequency chunks exploiting penalty for first and last letters and distinguishing pre-parts and post-parts. Our improved approach has raised the precision recall and f-measure of discovery by respectively 32 17 and 23. 1 Introduction According to linguistic theory morphemes are considered to be the smallest meaning-bearing elements of a language. However no adequate language-independent definition of the word as a unit has been agreed upon. If effective methods can be devised for the unsupervised discovery of morphemes they could aid the formulation of a linguistic theory of morphology for a new language. The utilization of morphemes as basic representational units in a statistical language model instead of words seems a promising course Creutz 2004 . Many natural language processing tasks including parsing semantic modeling information retrieval and machine translation frequently require a morphological analysis of the language at hand. The task of a morphological analyzer is to identify the lexeme citation form or inflection class of surface word forms in a language. It seems that even approximate automated morphological analysis would be beneficial for many NL applications dealing with large vocabularies . text retrieval applications . On the

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.