TAILIEUCHUNG - Báo cáo khoa học: "Optimal Multi-Paragraph Text Segmentation by Dynamic Programming"

There exist several methods of calculating a similarity curve, or a sequence of similarity values, representing the lexical cohesion of successive text constituents, ., paragraphs. Methods for deciding the locations of fragment boundaries are, however, scarce. We propose a fragmentation method based on dynamic programming. The method is theoretically sound and guaranteed to provide an optimal splitting on the basis of a similarity curve, a preferred fragment length, and a cost function defined. . | Optimal Multi-Paragraph Text Segmentation by Dynamic Programming Oskari Heinonen University of Helsinki Department of Computer Science . Box 26 Teollisuuskatu 23 FIN-00014 University of Helsinki Finland Abstract There exist several methods of calculating a similarity curve or a sequence of similarity values representing the lexical cohesion of successive text constituents . paragraphs. Methods for deciding the locations of fragment boundaries are however scarce. We propose a fragmentation method based on dynamic programming. The method is theoretically sound and guaranteed to provide an optimal splitting on the basis of a similarity curve a preferred fragment length and a cost function defined. The method is especially useful when control on fragment size is of importance. 1 Introduction Electronic full-text documents and digital libraries make the utilization of texts much more effective than before yet they pose new problems and requirements. For example document retrieval based on string searches typically returns either the whole document or just the occurrences of the searched words. What the user often is after however is microdocument a part of the document that contains the occurrences and is reasonably self-contained. Microdocuments can be created by utilizing lexical cohesion term repetition and semantic relations present in the text. There exist several methods of calculating a similarity curve or a sequence of similarity values representing the lexical cohesion of successive constituents such as paragraphs of text see . Hearst 1994 Hearst 1997 Koz-ima 1993 Morris and Hirst 1991 Yaari 1997 Youmans 1991 . Methods for deciding the locations of fragment boundaries are however not that common and those that exist are often rather heuristic in nature. To evaluate our fragmentation method to be explained in Section 2 we calculate the paragraph similarities as follows. We employ stemming remove stopwords and count the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.