TAILIEUCHUNG - Báo cáo khoa học: "Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation"

In this paper a novel solution to automatic and unsupervised word sense induction (WSI) is introduced. It represents an instantiation of the ‘one sense per collocation’ observation (Gale et al., 1992). Like most existing approaches it utilizes clustering of word co-occurrences. This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets of words instead of pairs. The combination with a two-step clustering process using sentence co-occurrences as features allows for accurate results. Additionally, a novel and likewise automatic and unsupervised evaluation method inspired by Sch¨. | Word Sense Induction Triplet-Based Clustering and Automatic Evaluation Stefan Bordag Natural Language Processing Department University of Leipzig Germany sbordag@ Abstract In this paper a novel solution to automatic and unsupervised word sense induction WSI is introduced. It represents an instantiation of the one sense per collocation observation Gale et al. 1992 . Like most existing approaches it utilizes clustering of word co-occurrences. This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation observation by using triplets of words instead of pairs. The combination with a two-step clustering process using sentence co-occurrences as features allows for accurate results. Additionally a novel and likewise automatic and unsupervised evaluation method inspired by Schutze s 1992 idea of evaluation of word sense disambiguation algorithms is employed. Offering advantages like reproducability and independency of a given biased gold standard it also enables automatic parameter optimization of the WSI algorithm. 1 Introduction The aim of word sense induction1 WSI is to find senses of a given target word Yarowski 1995 automatically and if possible in an unsupervised manner. WSI is akin to word sense disambiguation WSD both in methods employed and in problems encountered such as vagueness of sense distinctions Kilgarriff 1997 . The input to a WSI algorithm is a target word to be disambiguated . 1Sometimes called word sense discovery Dorow and Widdows 2003 or word sense discrimination Purandare 2004 Velldal 2O05 space and the output is a number of word sets representing the various senses . 3-dimensional expanse locate and office building square . Such results can be at the very least used as empirically grounded suggestions for lexicographers or as input for WSD algorithms. Other possible uses include automatic thesaurus or ontology construction machine translation or information .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.