TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure"

Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. | Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure Minwoo Jeong and Ivan Titov Saarland University Saarbriicken Germany titov @ Abstract Documents often have inherently parallel structure they may consist of a text and commentaries or an abstract and a body or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments would help to visualize such documents and construct friendlier user interfaces. To address this problem we propose an unsupervised Bayesian model for joint discourse segmentation and alignment. We apply our method to the English as a second language podcast dataset where each episode is composed of two parallel parts a story and an explanatory lecture. The predicted topical links uncover hidden relations between the stories and the lectures. In this domain our method achieves competitive results rivaling those of a previously proposed supervised technique. 1 Introduction Many documents consist of parts exhibiting a high degree of parallelism . abstract and body of academic publications summaries and detailed news stories etc. This is especially common with the emergence of the Web technologies many texts on the web are now accompanied with comments and discussions. Segmentation of these parallel parts into coherent fragments and discovery of hidden relations between them would facilitate the development of better user interfaces and improve the performance of summarization and information retrieval systems. Discourse segmentation of the documents composed of parallel parts is a novel and challenging problem as previous research has mostly focused on the linear segmentation of isolated texts . Hearst 1994 . The most straightforward approach would be to use a pipeline strategy where an existing segmentation algorithm finds discourse boundaries of each part independently and then the .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.