TAILIEUCHUNG - Báo cáo khoa học: "Insights from Network Structure for Text Mining"

Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested, and usually bootstrap either the pattern learning or the term harvesting process (or both) in a recursive cycle, using data learned in one step to generate more seeds for the next. | Insights from Network Structure for Text Mining Zornitsa Kozareva and Eduard Hovy USC Information Sciences Institute 4676 Admiralty Way Marina del Rey CA 90292-6695 kozareva hovy @ Abstract Text mining and data harvesting algorithms have become popular in the computational linguistics community. They employ patterns that specify the kind of information to be harvested and usually bootstrap either the pattern learning or the term harvesting process or both in a recursive cycle using data learned in one step to generate more seeds for the next. They therefore treat the source text corpus as a network in which words are the nodes and relations linking them are the edges. The results of computational network analysis especially from the world wide web are thus applicable. Surprisingly these results have not yet been broadly introduced into the computational linguistics community. In this paper we show how various results apply to text mining how they explain some previously observed phenomena and how they can be helpful for computational linguistics applications. 1 Introduction Text mining harvesting algorithms have been applied in recent years for various uses including learning of semantic constraints for verb participants Lin and Pantel 2002 related pairs in various relations such as part-whole Girju et al. 2003 cause Pantel and Pennacchiotti 2006 and other typical information extraction relations large collections of entities Soderland et al. 1999 Etzioni et al. 2005 features of objects Pasca 2004 and ontologies Carlson et al. 2010 . They generally start with one or more seed terms and employ patterns that specify the desired information as it relates to the 1616 seed s . Several approaches have been developed specifically for learning patterns including guided pattern collection with manual filtering Riloff and Shepherd 1997 automated surface-level pattern induction Agichtein and Gravano 2000 Ravichan-dran and Hovy 2002 probabilistic methods for taxonomy .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.