TAILIEUCHUNG - Báo cáo khoa học: "Cohesion and Collocation: Using Context Vectors in Text Segmentation"

Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the V e c T i l e system, produces similarity curves over texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm (Hearst, 1997). 1 Background . | Cohesion and Collocation Using Context Vectors in Text Segmentation Stefan Kaufmann CSLI Stanford University Linguistics Dept. Bldg. 460 Stanford CA 94305-2150 Abstract Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation the VecTile system produces similarity curves over texts using pre-compiled vector representations of the contextual behavior of words. The performance of this system is shown to improve over that of the purely string-based TextTiling algorithm Hearst 1997 . 1 Background The notion of text cohesion rests on the intuition that a text is held together by a variety of internal forces. Much of the relevant linguistic literature is indebted to Halliday and Hasan 1976 where cohesion is defined as a network of relationships between locations in the text arising from i grammatical factors co-reference use of pro-forms ellipsis and sentential connectives and ii lexical factors reiteration and collocation . Subsequent work has further developed this taxonomy Hoey 1991 and explored its implications in such areas as paragraphing Longacre 1979 Bond and Hayes 1984 Stark 1988 relevance Sperber and Wilson 1995 and discourse structure Grosz and Sidner 1986 . The lexical variety of cohesion is semantically defined invoking a measure of word similarity. But this is hard to measure objectively especially in the case of collocational relationships which hold between words primarily because they regularly cooccur. Halliday and Hasan refrained from a deeper analysis but hinted at a notion of degrees of proximity in the lexical system a function of the probability with which one tends to co-occur with another. p. 290 The VecTile system presented here is designed to utilize precisely this kind of lexical .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.