TAILIEUCHUNG - Báo cáo khoa học: "Detecting Novel Compounds: The Role of Distributional Evidence"

Research on the discovery of terms from corpora has focused on word sequences whose recurrent occurrence in a corpus is indicative of their terminological status, and has not addressed the issue of discovering terms when data is sparse. This becomes apparent in the case of noun compounding, which is extremely productive: more than half of the candidate compounds extracted from a corpus are attested only once. We show how evidence about established (., frequent) compounds can be used to estimate features that can discriminate rare valid compounds from rare nonce terms in addition to a variety of linguistic features. | Detecting Novel Compounds The Role of Distributional Evidence Mirella Lapata Department of Computer Science University of Sheffield Regent Court 211 Portobello Street Sheffield SI 4DP UK mlap@ Alex Lascarides School of Informatics The University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK alex@ Abstract Research on the discovery of terms from corpora has focused on word sequences whose recuưent occurrence in a corpus is indicative of their terminological status and has not addressed the issue of discovering terms when data is sparse. This becomes apparent in the case of noun compounding which is extremely productive more than half of the candidate compounds extracted from a corpus are attested only once. We show how evidence about established . frequent compounds can be used to estimate features that can discriminate rare valid compounds from rare nonce terms in addition to a variety of linguistic features than can be easily gleaned from corpora without relying on parsed text. 1 Introduction The nature and properties of compounds have been studied at length in the theoretical linguistics literature. It is a well-known fact that compound noun formation in English is relatively productive see 1 . Although compounds are typically binary see la b they can be also longer than two words see le . Compounds are commonly written as a concatenation of words see la b or as single words see lc sometimes a hyphen is also used see le . 1 a. income tax b. AT T headquarters c. bathroom d. public-relations e. income-tax relief The use of noun compounds is frequent not only in technical writing and newswire text McDonald 1982 but also in fictional prose Leonard 1984 and spoken language Liberman and Sproat 1992 . Novel compounds are used as a text compression device Marsh 1984 . to pack meaning into a minimal amount of linguistic structure as a deictic device or as a means to classify an entity which has no specific name Downing 1977 . .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.