Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "CONTEXTUAL WORD SIMILARITY AND ESTIMATION FROM SPARSE DATA"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

In recent years there is much interest in word cooccurrence relations, such as n-grams, verbobject combinations, or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words, as determined by an appropriate word similarity metric. | CONTEXTUAL WORD SIMILARITY AND ESTIMATION FROM SPARSE DATA Ido Dagan AT T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974 daganQresearch.att.com Shaul Marcus Computer Science Department Technion Haifa 32000 Israel shaulQcs.technion.ac.il Shaul Markovitch Computer Science Department Technion Haifa 32000 Israel shaulmQcs.technion.ac.il Abstract In recent years there is much interest in word cooccurrence relations such as n-grams verbobject combinations or cooccurrence within a limited context. This paper discusses how to estimate the probability of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words as determined by an appropriate word similarity metric. Our evaluation suggests that this method performs better than existing smoothing methods and may provide an alternative to class based models. 1 Introduction Statistical data on word cooccurrence relations play a major role in many corpus based approaches for natural language processing. Different types of cooccurrence relations are in use such as cooccurrence within a consecutive sequence of words n-grams within syntactic relations verb-object adjective-noun etc. or the cooccurrence of two words within a limited distance in the context. Statistical data about these various cooccurrence relations is employed for a variety of applications such as speech recognition Jelinek 1990 language generation Smadja and McKeown 1990 lexicography Church and Hanks 1990 machine translation Brown et al. Sadler 1989 information retrieval Maarek and Smadja 1989 and various disambiguation tasks Dagan et al. 1991 Hindle and Rooth 1991 Grishmanet al. 1986 Dagan and Itai 1990 . A major problem for the above applications is how to estimate the probability of cooccurrences that were not observed in the training corpus. Due to data sparseness in unrestricted language the aggregate .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.