TAILIEUCHUNG - Báo cáo khoa học: "Temporal Context: Applications and Implications for Computational Linguistics"

This paper describes several ongoing projects that are united by the theme of changes in lexical use over time. We show that paying attention to a document’s temporal context can lead to improvements in information retrieval and text categorization. We also explore a potential application in document clustering that is based upon different types of lexical changes. | Temporal Context Applications and Implications for Computational Linguistics Robert A. Liebscher Department of Cognitive Science University of California San Diego La Jolla CA 92037 rliebsch@ Abstract This paper describes several ongoing projects that are united by the theme of changes in lexical use over time. We show that paying attention to a document s temporal context can lead to improvements in information retrieval and text categorization. We also explore a potential application in document clustering that is based upon different types of lexical changes. 1 Introduction Tasks in computational linguistics CL normally focus on the content of a document while paying little attention to the context in which it was produced. The work described in this paper considers the importance of temporal context. We show that knowing one small piece of information-a document s publication date-can be beneficial for a variety of CL tasks some familiar and some novel. The field of historical linguistics attempts to categorize changes at all levels of language use typically relying on data that span centuries Hock 1991 . The recent availability of very large textual corpora allows for the examination of changes that take place across shorter time periods. In particular we focus on lexical change across decades in corpora of academic publications and show that the changes can be fairly dramatic during a relatively short period of time. As a preview consider Table 1 which lists the top five unigrams that best distinguished the field of computational linguistics at different points in time as derived from the ACL proceedings1 using the odds ratio measure see Section 3 . One can quickly glean that the field has become increasingly empirical through time. 1979-84 1985-90 1991-96 1997-02 system phrase discourse word natural plan tree corpus language structure algorithm training knowledge logical unification model database interpret plan data Table 1 ACL s most .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.