TAILIEUCHUNG - Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. The novelty of our approach is to use “pseudo negative examples” with reliable confidence score estimated by a classifier trained with positive and unlabeled examples. We show experimentally that our proposed method achieves. | A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation An Empirical Study on Japanese Web Search Query Makoto Imamura and Yasuhiro Takayama Information Technology R D Center Mitsubishi Electric Corporation 5-1-1 Ofuna Kamakura Kanagawa Japan hiro@ea . Abstract This paper proposes to solve the bottleneck of finding training data for word sense disambiguation WSD in the domain of web queries where a complete set of ambiguous word senses are unknown. In this paper we present a combination of active learning and semi-supervised learning method to treat the case when positive examples which have an expected word sense in web search result are only given. The novelty of our approach is to use pseudo negative examples with reliable confidence score estimated by a classifier trained with positive and unlabeled examples. We show experimentally that our proposed method achieves close enough WSD accuracy to the method with the manually prepared negative examples in several Japanese Web search data. 1 Introduction In Web mining for sentiment or reputation analysis it is important for reliable analysis to extract large amount of texts about certain products shops or persons with high accuracy. When retrieving texts from Web archive we often suffer from word sense ambiguity and WSD system is indispensable. For instance when we try to analyze reputation of Loft a name of variety store chain in Japan we found that simple text search retrieved many unrelated texts which contain Loft with different senses such as an attic room an angle of golf club face a movie title a name of a club with live music and so on. The words in Web search queries are often proper nouns. Then it is not trivial to discriminate these Nobuhiro Kaji Masashi Toyoda and Masaru Kitsuregawa Institute of Industrial Science The University of Tokyo 4-6-1 Komaba Meguro-ku Tokyo .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.