TAILIEUCHUNG - Báo cáo khoa học: "DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS"

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. | DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT T Bell Laboratories 600 Mountain Ave. Murray Hill NJ 07974 USA pereira@ Naftali Tishby Dept of Computer Science Hebrew University Jerusalem 91904 Israel tishby@ . il Lillian Lee Dept of Computer Science Cornell University Ithaca NY 14850 USA llee@ Abstract We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters as the annealing parameter increases existing clusters become unstable and subdivide yielding a hierarchical soft clustering of the data. Clusters are used as the basis for class models of word coocurrence and the models evaluated with respect to held-out test data. INTRODUCTION Methods for automatically classifying words according to their contexts of use have both scientific and practical interest. The scientific questions arise in connection to distributional views of linguistic particularly lexical structure and also in relation to the question of lexical acquisition both from psychological and computational learning perspectives. From the practical point of view word classification addresses questions of data sparseness and generalization in statistical language models particularly models for deciding among alternative analyses proposed by a grammar. It is well known that a simple tabulation of frequencies of certain words participating in certain configurations for example of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.