TAILIEUCHUNG - Báo cáo khoa học: "Scaling Context Space"

Context is used in many NLP systems as an indicator of a term’s syntactic and semantic function. The accuracy of the system is dependent on the quality and quantity of contextual information available to describe each term. However, the quantity variable is no longer fixed by limited corpus resources. Given fixed training time and computational resources, it makes sense for systems to invest time in extracting high quality contextual information from a fixed corpus. However, with an effectively limitless quantity of text available, extraction rate and representation size need to be considered. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 231-238. Scaling Context Space James R. Curran and Marc Moens Institute for Communicating and Collaborative Systems University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW United Kingdom jamesc marc @ Abstract Context is used in many NLP systems as an indicator of a term s syntactic and semantic function. The accuracy of the system is dependent on the quality and quantity of contextual information available to describe each term. However the quantity variable is no longer hxed by limited corpus resources. Given hxed training time and computational resources it makes sense for systems to invest time in extracting high quality contextual information from a hxed corpus. However with an effectively limitless quantity of text available extraction rate and representation size need to be considered. We use thesaurus extraction with a range of context extracting tools to demonstrate the interaction between context quantity time and size on a corpus of 300 million words. 1 Introduction Context plays an important role in many natural language tasks. For example the accuracy of part of speech taggers or word sense disambiguation systems depends on the quality and quantity of contextual information these systems can extract from the training data. When predicting the sense of a word for instance the immediately preceding word is likely to be more important than the tenth previous word similar observations can be made about POS taggers or chunkers. A crucial part of training these systems lies in extracting from the data high-quality contextual information in the sense of dehning contexts that are both accurate and correlated with the information the POS tags the word senses the chunks the system is trying to extract. The quality of contextual information is often determined by the size of the training corpus with less data available .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.