TAILIEUCHUNG - Báo cáo khoa học: "Discovering Corpus-Specific Word Senses"

This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local graph of similar words around an ambiguous word. Discrimination against previously extracted sense clusters enables us to discover new senses. We use the same data for both recognising and resolving ambiguity. | Discovering Corpus-Specific Word Senses Beate Dorow Institut fur Maschinelle Sprachverarbeitung Universitãt Stuttgart Germany Dominic Widdows Center for the Study of Language and Information Stanford University California dwiddows@ Abstract This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local graph of similar words around an ambiguous word. Discrimination against previously extracted sense clusters enables us to discover new senses. We use the same data for both recognising and resolving ambiguity. 1 Introduction This paper describes an algorithm which automatically discovers word senses from free text and maps them to the appropriate entries of existing dictionaries or taxonomies. Automatic word sense discovery has applications of many kinds. It can greatly facilitate a lexicographer s work and can be used to automatically construct corpus-based taxonomies or to tune existing ones. The same corpus evidence which supports a clustering of an ambiguous word into distinct senses can be used to decide which sense is referred to in a given context Schiitze 1998 . This paper is organised as follows. In section 2 we present the graph model from which we discover word senses. Section 3 describes the way we divide graphs surrounding ambiguous words into different areas corresponding to different senses using Markov clustering van Dongen 2000 . The quality of the Markov clustering depends strongly on several parameters such as a granularity factor and the size of the local graph. In section 4 we outline a word sense discovery algorithm which bypasses the problem of parameter tuning. We conducted a pilot experiment to examine the performance of our algorithm on a set of words with varying degree of ambiguity. Section 5 describes

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
31    239    0    19-04-2024
19    228    0    19-04-2024
34    211    1    19-04-2024
46    185    0    19-04-2024
14    170    0    19-04-2024
10    155    0    19-04-2024
15    183    0    19-04-2024
23    154    0    19-04-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.