TAILIEUCHUNG - Báo cáo khoa học: "A Bayesian Method for Robust Estimation of Distributional Similarities"

Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. | A Bayesian Method for Robust Estimation of Distributional Similarities Jun ichi Kazama Stijn De Saeger Kow Kuroda Masaki Murata Kentaro Torisawa Language Infrastructure Group MASTAR Project National Institute of Information and Communications Technology NICT 3-5 Hikaridai Seika-cho Soraku-gun Kyoto 619-0289 Japan kazama stijn kuroda torisawa @ f Department of Information and Knowledge Engineering Faculty Graduate School of Engineering Tottori University 4-101 Koyama-Minami Tottori 680-8550 Japan murata@ Abstract Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution. When the context profiles are multinomial distributions the priors are Dirichlet and the base measure is the Bhattacharyya coefficient we can derive an analytical form that allows efficient calculation. For the task of word similarity estimation using a large amount of Web data in Japanese we show that the proposed measure gives better accuracies than other well-known similarity measures. 1 Introduction The semantic similarity of words is a longstanding topic in computational linguistics because it is theoretically intriguing and has many applications in the field. Many researchers have conducted studies based on the distributional hypothesis Harris 1954 which states that words that occur in the same contexts tend to have similar meanings. A number of semantic similarity measures have been proposed based on this hypothesis Hindle 1990 Grefenstette 1994 Dagan et al. 1994 Dagan et al. 1995 Lin 1998 Dagan et al. 1999 . The work was done while the author was at NICT. In general most semantic similarity measures have

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.