TAILIEUCHUNG - Báo cáo khoa học: "Collocation Extraction beyond the Independence Assumption"

In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. | Collocation Extraction beyond the Independence Assumption Gerlof Bouma Universitat Potsdam Department Linguistik Campus Golm Haus 24 35 Karl-Liebknecht-StraBe 24-25 14476 Potsdam Germany Abstract In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards we find the new association measures vary in their effectiveness. 1 Introduction Collocation extraction typically proceeds by scoring collocation candidates with an association measure where high scores are taken to indicate likely collocationhood. Two well-known such measures are pointwise mutual information PMI and mutual information MI . In terms of observing a combination of words w1 w2 these are p wi w2 . i W1 W2 log 1 p wi p W2 I wi W2 2 p x y i x y . 2 xe wi -W1 y w2 -W2 PMI 1 is the logged ratio of the observed bigramme probability and the expected bigramme probability under independence of the two words in the combination. MI 2 is the expected outcome of PMI and measures how much information of the distribution of one word is contained in the distribution of the other. PMI was introduced into the collocation extraction field by Church and Hanks 1990 . Dunning 1993 proposed the use of the likelihoodratio test statistic which is equivalent to MI up to a constant factor. Two aspects of P MI are worth highlighting. First the observed occurrence probability pobs is compared to the expected occurrence probability pexp. Secondly the independence assumption underlies the estimation of pexp. The first aspect is motivated by the observation that interesting combinations are often those that are unexpectedly frequent. For instance the .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.