TAILIEUCHUNG - Báo cáo khoa học: "HITS-based Seed Selection and Stop List Construction for Bootstrapping"

In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. . | HITS-based Seed Selection and Stop List Construction for Bootstrapping Tetsuo Kiso Masashi Shimbo Mamoru Komachi Yuji Matsumoto Graduate School of Information Science Nara Institute of Science and Technology Ikoma Nara 630-0192 Japan tetsuo-s shimbo komachi matsu @ Abstract In bootstrapping seed set expansion selecting good seeds and creating stop lists are two effective ways to reduce semantic drift but these methods generally need human supervision. In this paper we propose a graphbased approach to helping editors choose effective seeds and stop list instances applicable to Pantel and Pennacchiotti s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Klein-berg s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method. 1 Introduction Bootstrapping Yarowsky 1995 Abney 2004 is a technique frequently used in natural language processing to expand limited resources with minimal supervision. Given a small amount of sample data seeds representing a particular semantic class of interest bootstrapping first trains a classifier which often is a weighted list of surface patterns characterizing the seeds using the seeds and then apply it on the remaining data to select instances most likely to be of the same class as the seeds. These selected instances are added to the seed set and the process is iterated until sufficient labeled data are acquired. Many bootstrapping algorithms have been proposed for a variety of tasks word sense disambiguation Yarowsky 1995 Abney 2004 information extraction Hearst 1992 Riloff and Jones 1999 Thelen and Riloff 2002 Pantel and Pennacchiotti 2006 named entity recognition Collins and Singer 1999 part-of-speech tagging Clark et al. 2003 30 and statistical parsing Steedman et al. 2003 Mc-Closky et al. 2006 . Bootstrapping algorithms however are known to suffer from the problem called .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.