TAILIEUCHUNG - Báo cáo khoa học: "Mistake-Driven Mixture of Hierarchical Tag Context Trees "

This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures: 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. 1 | Mistake-Driven Mixture of Hierarchical Tag Context Trees Masahiko Haruno NTT Communication Science Laboratories 1-1 Hikari-No-Oka Yokosuka-Shi Kanagawa 239 Japan Yuji Matsumoto NAIST 8916-5 Takayama-cho Ikoma-Shi Nara 630-01 Japan Abstract This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. To well reflect the data distribution we represent each tag model as a hierarchical tag proper noun noun context tree. By using the hierarchical tag context tree the constituents of sequential tag models gradually change from broad coverage tags . noun to specific exceptional words that cannot be captured by general tags. In other words the method incorporates not only frequent connections but also infrequent ones that are often considered to be collocational. We evaluate several tag models by implementing Japanese part-of-speech taggers that share all other conditions . dictionary and word model other than their tag models. The experimental results show the proposed method significantly outperforms both hand-crafted and conventional statistical methods. 1 Introduction The last few years have seen the great success of stochastic part-of-speech POS taggers Church 1988 Kupiec 1992 Charniak et al. 1993 Brill 1992 Nagata 1994 . The stochastic approach generally attains 94 to 96 accuracy and replaces the labor-intensive compilation of linguistics rules by using an automated learning algorithm. However 1NTT is an abbreviation of Nippon Telegraph and Telephone Corporation. practical systems require more accuracy because POS tagging is an inevitable pre-processing step for all practical

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.