TAILIEUCHUNG - Báo cáo khoa học: "MULTI-PARAGRAPH SEGMENTATION EXPOSITORY TEXT"

This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts. . | MULTI-PARAGRAPH SEGMENTATION OF EXPOSITORY TEXT Marti A. Hearst Computer Science Division 571 Evans Hall University of California Berkeley Berkeley CA 94720 and Xerox Palo Alto Research Center marti@ Abstract This paper describes TextTiling an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reflect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts. INTRODUCTION The structure of expository texts can be characterized as a sequence of subtopical discussions that occur in the context of a few main topic discussions. For example a popular science text called Stargazers whose main topic is the existence of life on earth and other planets can be described as consisting of the following subdiscussions numbers indicate paragraph numbers 1-3 Intro - the search for life in space 4-5 The moon s chemical composition 6-8 How early proximity of the moon shaped it 9-12 How the moon helped life evolve on earth 13 Improbability of the earth-moon system 14-16 Binary trinary star systems make life unlikely 17-18 The low probability of non-binary trinary systems 19-20 Properties of our sun that facilitate life 21 Summary Subtopic structure is sometimes marked in technical texts by headings and subheadings which divide the text into coherent segments Brown Yule 1983 140 state that this kind of division is one of the most basic in discourse. However many expository texts consist of long sequences of paragraphs with very little structural demarcation. This paper presents fully-implemented algorithms that use lexical cohesion relations to partition expository texts into multi-paragraph segments that .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.