TAILIEUCHUNG - Báo cáo khoa học: "Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing"

In this paper we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis, including features based on discourse comprehension, syntactic patterns, and information drawn from an online encyclopedia. In experiments carried out on a book collection, the method was found to lead to an improvement of roughly 140% as compared to an existing state-of-the-art supervised method. | Linguistically Motivated Features for Enhanced Back-of-the-Book Indexing Andras Csomai and Rada Mihalcea Department of Computer Science University of North Texas csomaia@ rada@ Abstract In this paper we present a supervised method for back-of-the-book index construction. We introduce a novel set of features that goes beyond the typical frequency-based analysis including features based on discourse comprehension syntactic patterns and information drawn from an online encyclopedia. In experiments carried out on a book collection the method was found to lead to an improvement of roughly 140 as compared to an existing state-of-the-art supervised method. 1 Introduction Books represent one of the oldest forms of written communication and have been used since thousands of years ago as a means to store and transmit information. Despite this fact given that a large fraction of the electronic documents available online and elsewhere consist of short texts such as Web pages news articles scientific reports and others the focus of natural language processing techniques to date has been on the automation of methods targeting short documents. We are witnessing however a change more and more books are becoming available in electronic format in projects such as the Million Books project http .org details millionbooks the Gutenberg project http or Google Book Search http . Similarly a large number of the books published in recent years are often available - for purchase or through libraries - in electronic format. This means that the need for language processing techniques able to handle very large documents such as books is becoming increasingly important. This paper addresses the problem of automatic back-of-the-book index construction. A back-of-the-book index typically consists of the most important keywords addressed in a book with pointers to the relevant pages inside the book. The construction of such indexes is

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.