TAILIEUCHUNG - Báo cáo khoa học: "Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning"

This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese, a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus, the proposed method is primarily based on the rules, and then the residual errors are corrected by adopting a memory-based machine learning method. | Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning Seong-Bae Park Byoung-Tak Zhang School of Computer Science and Engineering Seoul National University Seoul 151-744 Korea sbpark btzhang @ Abstract This paper proposes a hybrid of handcrafted rules and a machine learning method for chunking Korean. In the partially free word-order languages such as Korean and Japanese a small number of rules dominate the performance due to their well-developed postpositions and endings. Thus the proposed method is primarily based on the rules and then the residual errors are corrected by adopting a memory-based machine learning method. Since the memory-based learning is an efficient method to handle exceptions in natural language processing it is good at checking whether the estimates are exceptional cases of the rules and revising them. An evaluation of the method yields the improvement in F-score over the rules or various machine learning methods alone. 1 Introduction Text chunking has been one of the most interesting problems in natural language learning community since the first work of Ramshaw and Marcus 1995 using a machine learning method. The main purpose of the machine learning methods applied to this task is to capture the hypothesis that best determine the chunk type of a word and such methods have shown relatively high performance in English Kudo and Matsumoto 2000 Zhang et. al 2001 . In order to do it various kinds of information such as lexical information part-of-speech and grammatical relation of the neighboring words is used. Since the position of a word plays an important role as a syntactic constraint in English the methods are successful even with local information. However these methods are not appropriate for chunking Korean and Japanese because such languages have a characteristic of partially free wordorder. That is there is a very weak positional constraint in these languages. Instead of positional constraints they have overt

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.