TAILIEUCHUNG - Báo cáo khoa học: "An Active Learning Approach to Finding Related Terms"

We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words, which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast, our system involves the user throughout the labeling process, using active learning to intelligently explore the space of similar words. | An Active Learning Approach to Finding Related Terms David Vickrey Oscar Kipersztok Daphne Koller Stanford University Boeing Research Technology Stanford Univeristy dvickrey@ koller@ @ Abstract We present a novel system that helps nonexperts find sets of similar words. The user begins by specifying one or more seed words. The system then iteratively suggests a series of candidate words which the user can either accept or reject. Current techniques for this task typically bootstrap a classifier based on a fixed seed set. In contrast our system involves the user throughout the labeling process using active learning to intelligently explore the space of similar words. In particular our system can take advantage of negative examples provided by the user. Our system combines multiple preexisting sources of similarity data a standard thesaurus WordNet contextual similarity enabling it to capture many types of similarity groups synonyms of crash types of car etc. . We evaluate on a hand-labeled evaluation set our system improves over a strong baseline by 36 . 1 Introduction Set expansion is a well-studied NLP problem where a machine-learning algorithm is given a fixed set of seed words and asked to find additional members of the implied set. For example given the seed set elephant horse bat the algorithm is expected to return other mammals. Past work . Roark Charniak 1998 Ghahramani Heller 2005 Wang Cohen 2007 Pantel et al. 2009 generally focuses on semi-automatic acquisition of the remaining members of the set by mining large amounts of unlabeled data. State-of-the-art set expansion systems work well for well-defined sets of nouns . US Presidents particularly when given a large seed set. Set expansions is more difficult with fewer seed words and for other kinds of sets. The seed words may have multiple senses and the user may have in mind a variety of attributes that the answer must match. For example suppose

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.