TAILIEUCHUNG - Báo cáo khoa học: "A Syntactic and Lexical-Based Discourse Segmenter"

We present a syntactic and lexically based discourse segmenter (SLSeg) that is designed to avoid the common problem of over-segmenting text. Segmentation is the first step in a discourse parser, a system that constructs discourse trees from elementary discourse units. We compare SLSeg to a probabilistic segmenter, showing that a conservative approach increases precision at the expense of recall, while retaining a high F-score across both formal and informal texts. | A Syntactic and Lexical-Based Discourse Segmenter Milan Tofiloski School of Computing Science Simon Fraser University Burnaby BC Canada mta45@ Julian Brooke Department of Linguistics Simon Fraser University Burnaby BC Canada jab18@ Maite Taboada Department of Linguistics Simon Fraser University Burnaby BC Canada mtaboada@ Abstract We present a syntactic and lexically based discourse segmenter SLSeg that is designed to avoid the common problem of over-segmenting text. Segmentation is the first step in a discourse parser a system that constructs discourse trees from elementary discourse units. We compare SLSeg to a probabilistic segmenter showing that a conservative approach increases precision at the expense of recall while retaining a high F-score across both formal and informal texts. 1 Introduction Discourse segmentation is the process of decomposing discourse into elementary discourse units EDUs which may be simple sentences or clauses in a complex sentence and from which discourse trees are constructed. In this sense we are performing low-level discourse segmentation as opposed to segmenting text into chunks or topics . Passonneau and Litman 1997 . Since segmentation is the first stage of discourse parsing quality discourse segments are critical to building quality discourse representations Soricut and Marcu 2003 . Our objective is to construct a discourse segmenter that is robust in handling both formal newswire and informal online reviews texts while minimizing the insertion of incorrect discourse boundaries. Robustness is achieved by constructing discourse segments in a principled way using syntactic and lexical information. Our approach employs a set of rules for inserting segment boundaries based on the syntax of each sentence. The segment boundaries are then further refined by using lexical information that This work was supported by an NSERC Discovery Grant 261104-2008 to Maite Taboada. We thank Angela Cooper and Morgan Mameni for .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.