TAILIEUCHUNG - Báo cáo khoa học: "Automatically Generating Term-frequency-induced Taxonomies"

We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data. | Automatically Generating Term-frequency-induced Taxonomies Karin Murthy Tanveer A Faruquie L Venkata Subramaniam K Hima Prasad Mukesh Mohania IBM Research - India karinmur ftanveer lvsubram hkaranam mkmukesh @ Abstract We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data. 1 Introduction Taxonomy deduction is an important task to understand and manage information. However building taxonomies manually for specific domains or data sources is time consuming and expensive. Techniques to automatically deduce a taxonomy in an unsupervised manner are thus indispensable. Automatic deduction of taxonomies consist of two tasks extracting relevant terms to represent concepts of the taxonomy and discovering relationships between concepts. For unstructured text the extraction of relevant terms relies on information extraction methods Etzioni et al. 2005 . The relationship extraction task can be classified into two categories. Approaches in the first category use lexical-syntactic formulation to define patterns either manually Kozareva et al. 2008 or automatically Girju et al. 2006 and apply those patterns to mine instances of the patterns. Though producing accurate results these approaches usually have low coverage for many domains and suffer from the problem of inconsistency between terms when connecting the instances as chains to form a taxonomy. The second category of approaches uses clustering to discover terms and the relationships between them Roy and Subramaniam 2006 even if those relationships do not explicitly appear in the text. Though these methods tackle .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.