TAILIEUCHUNG - Báo cáo khoa học: "Linear Text Segmentation using a Dynamic Programming Algorithm"

In this paper we introduce a dynamic programming algorithm to perform linear text segmentation by global minimization of a segmentation cost function which consists of: (a) within-segment word similarity and (b) prior information about segment length. The evaluation of the segmentation accuracy of the algorithm on Choi's text collection showed that the algorithm achieves the best segmentation accuracy so far reported in the literature. Keywords: Text Segmentation, Document Retrieval, Information Retrieval, Machine Learning. . | Linear Text Segmentation using a Dynamic Programming Algorithm Athanasios Kehagias Dept of Math. Phys and Comp. Sciences Aristotle Univ of Thessaloniki GREECE kehagias@ Fragkou Pavlina Vassilios Petridis Dept of Elect and Computer Eng. Aristotle Univ of Thessaloniki GREECE fragou@ petridis@ Abstract In this paper we introduce a dynamic programming algorithm to perform linear text segmentation by global minimization of a segmentation cost function which consists of a within-segment word similarity and b prior information about segment length. The evaluation of the segmentation accuracy of the algorithm on Choi s text collection showed that the algorithm achieves the best segmentation accuracy so far reported in the literature. Keywords Text Segmentation Document Retrieval Information Retrieval Machine Learning. 1 Introduction Text segmentation is an important problem in information retrieval. Its goal is the division of a text into homogeneous lexically coherent segments segments exhibiting the following properties a each segment deals with a particular subject and b contiguous segments deal with different subjects. Those segments can be retrieved from a large database of unformatted or loosely formatted text as being relevant to a query. This paper presents a dynamic programming algorithm which performs linear segmentation 1 by global minimization of a segmentation cost. The As opposed to hierarchical segmentation Yaari 1997 segmentation cost is defined by a function consisting of two factors a within-segment word similarity and b prior information about segment length. Our algorithm has the advantage of being able to be applied to either large texts - to segment them into their constituent parts . to segment an article into sections - or to a stream of independent concatenated texts . to segment a transcript of news into separate stories . For the calculation of the segment homogeneity or alternatively .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.