TAILIEUCHUNG - Báo cáo khoa học: "An Unsupervised Approach to Biography Production using Wikipedia"

We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges. | An Unsupervised Approach to Biography Production using Wikipedia Fadi Biadsy Julia Hirschberg and Elena Filatova Department of Computer Science Columbia University New York NY 10027 USA fadi julia @ InforSense LLC Cambridge MA 02141 USA efilatova@ Abstract We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges. Overall our system significantly outperforms all systems that participated in DUC2004 according to the ROUGE-L metric and is preferred by human subjects. 1 Introduction Producing biographies by hand is a labor-intensive task generally done only for famous individuals. The process is particularly difficult when persons of interest are not well known and when information must be gathered from a wide variety of sources. We present an automatic unsupervised multi-document summarization MDS approach based on extractive techniques to producing biographies answering the question Who is X There is growing interest in automatic MDS in general due in part to the explosion of multilingual and multimedia data available online. The goal of MDS is to automatically produce a concise well-organized and fluent summary of a set of documents on the same topic. MDS strategies have been employed to produce both generic summaries and query-focused summaries. Due to the complexity of text generation most summarization systems employ sentence-extraction techniques in which the most relevant sentences from one or more documents are selected to represent the summary. This approach is guaranteed to produce grammatical .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.