Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Text Segmentation Using Reiteration and Collocation"

Ðình Nguyên 47 5 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. . | Text Segmentation Using Reiteration and Collocation Amanda c. Jobbins Department of Computing Nottingham Trent University Nottingham NG1 4BU UK ajobbins@resumix.com Lindsay J. Evett Department of Computing Nottingham Trent University Nottingham NG1 4BU UK lje@doc.ntu.ac.uk Abstract A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features word repetition collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. Introduction Many examples of heterogeneous data can be found in daily life. The Wall Street Journal archives for example consist of a series of articles about different subject areas. Segmenting such data into distinct topics is useful for information retrieval where only those segments relevant to a user s query can be retrieved. Text segmentation could also be used as a pre-processing step in automatic summarisation. Each segment could be summarised individually and then combined to provide an abstract for a document. Previous work on text segmentation has used term matching to identify clusters of related text. Salton and Buckley 1992 and later Hearst 1994 extracted related text portions by matching high frequency terms. Yaari 1997 segmented text into a hierarchical structure identifying sub-segments of larger segments. Ponte and Croft 1997 used word co-occurrences to expand the number of terms for matching. Reynar 1994 compared all words across a text rather than the more usual nearest neighbours. A problem with using word repetition is that inappropriate matches can be made because of the lack of contextual information .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Towards Developing Generation Algorithms for Text-to-Text Applications"

Báo cáo khoa học: "Entailment-based Text Exploration with Application to the Health-care Domain"

Báo cáo khoa học: "Studies in Machine Translation—8: Manual for Postediting Russian Text"

Báo cáo khoa học: " Paraphrase Generation and Information Retrieval from Stored Text"

Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization"

Báo cáo khoa học: "Text Segmentation by Language Using Minimum Description Length"

Báo cáo khoa học: "A Novel Burst-based Text Representation Model for Scalable Event Detection"

Báo cáo khoa học: "Learning to Temporally Order Medical Events in Clinical Text"

Báo cáo khoa học: "Text-level Discourse Parsing with Rich Linguistic Features"

Báo cáo khoa học: "PDTB-style Discourse Annotation of Chinese Text"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.