TAILIEUCHUNG - Báo cáo khoa học: "A Syllable Based Word Recognition Model for Korean Noun Extraction"

Noun extraction is very important for many NLP applications such as information retrieval, automatic text classification, and information extraction. Most of the previous Korean noun extraction systems use a morphological analyzer or a Partof-Speech (POS) tagger. Therefore, they require much of the linguistic knowledge such as morpheme dictionaries and rules (. morphosyntactic rules and morphological rules). | A Syllable Based Word Recognition Model for Korean Noun Extraction Do-Gil Lee and Hae-Chang Rim Heui-Seok Lim Dept. of Computer Science Engineering Dept. of Information Communications Korea University Chonan University 1 5-ka Anam-dong Seongbuk-ku 115 AnSeo-dong Seoul 136-701 Korea CheonAn 330-704 Korea dglee rim @ limhs@ Abstract Noun extraction is very important for many NLP applications such as information retrieval automatic text classification and information extraction. Most of the previous Korean noun extraction systems use a morphological analyzer or a Part-of-Speech POS tagger. Therefore they require much of the linguistic knowledge such as morpheme dictionaries and rules . morphosyntactic rules and morphological rules . This paper proposes a new noun extraction method that uses the syllable based word recognition model. It finds the most probable syllable-tag sequence of the input sentence by using automatically acquired statistical information from the POS tagged corpus and extracts nouns by detecting word boundaries. Furthermore it does not require any labor for constructing and maintaining linguistic knowledge. We have performed various experiments with a wide range of variables influencing the performance. The experimental results show that without morphological analysis or POS tagging the proposed method achieves comparable performance with the previous methods. 1 Introduction Noun extraction is a process to find every noun in a document Lee et al. 2001 . In Korean Nouns are used as the most important terms features that express the document in NLP applications such as information retrieval document categorization text summarization information extraction and etc. Korean is a highly agglutinative language and nouns are included in Eojeols. An Eojeol is a surface level form consisting of more than one combined morpheme. Therefore morphological analysis or POS tagging is required to extract Korean nouns. The .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.