TAILIEUCHUNG - Báo cáo khoa học: "Retrieving Collocations by Co-occurrences and Word Order Constraints"

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially domain specific collocations, are retrieved. | Retrieving Collocations by Co-occurrences and Word Order Constraints Sayori Shimohata Toshiyuki Sugio and Junji Nagata Kansai Laboratory Research Development Group Oki Electric Industry Co. Ltd. Crystal Tower 1-2-27 Shiromi Chuo-ku Osaka 540 Japan sayori sugio nagata @ Abstract In this paper we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages 1 extracting strings of characters as units of collocations 2 extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method various range of collocations especially domain specific collocations are retrieved. The method is practical because it uses plain texts without any information dependent on a language such as lexical knowledge and parts of speech. 1 Introduction A collocation is a recurrent combination of words ranging from word level to sentence level. In this paper we classify collocations into two types according to their structures. One is an uninterrupted collocation which consists of a sequence of words the other is an interrupted collocation which consists of words containing one or several gaps filled in by substitutable words or phrases which belong to the same category. The features of collocations are defined as follows collocations are recurrent collocations consist of one or several lexical units order of units are rigid in a collocation. For language processing such as machine translation a knowledge of domain specific collocations is indispensable because what collocations mean are different from their literal meaning and the usage and meaning of a collocation is totally dependent on each domain. In addition new collocations are produced one after another and most of them are technical jargons. There has been a growing interest in corpus-based approaches which retrieve collocations from large corpora Nagao and Mori 1994 Ikehara et .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.