TAILIEUCHUNG - Báo cáo khoa học: "An API for Measuring the Relatedness of Words in Wikipedia"

The API computes semantic relatedness by: 1. taking a pair of words as input; 2. retrieving the Wikipedia articles they refer to (via a disambiguation strategy based on the link structure of the articles); 3. computing paths in the Wikipedia categorization graph between the categories the articles are assigned to; 4. returning as output the set of paths found, scored according to some measure definition. The implementation includes path-length (Rada et al., 1989; Wu & Palmer, 1994; Leacock & Chodorow, 1998), information-content (Resnik, 1995; Seco et al., 2004) and text-overlap (Lesk, 1986; Banerjee & Pedersen, 2003) measures, as described. | An API for Measuring the Relatedness of Words in Wikipedia Simone Paolo Ponzetto and Michael Strube EML Research gGmbH Schloss-Wolfsbrunnenweg 33 69118 Heidelberg Germany http nlp Abstract 3 The Application Programming Interface We present an API for computing the semantic relatedness of words in Wikipedia. 1 Introduction The last years have seen a large amount of work in Natural Language Processing NLP using measures of semantic similarity and relatedness. We believe that the extensive usage of such measures derives also from the availability of robust and freely available software that allows to compute them Pedersen et al. 2004 WordNet Similarity . In Ponzetto Strube 2006 and Strube Ponzetto 2006 we proposed to take the Wikipedia categorization system as a semantic network which served as basis for computing the semantic relatedness of words. In the following we present the API we used in our previous work hoping that it will encourage further research in NLP using Wikipedia1. 2 Measures of Semantic Relatedness Approaches to measuring semantic relatedness that use lexical resources transform these resources into a network or graph and compute relatedness using paths in it see Budanitsky Hirst 2006 for an extensive review . For instance Rada et al. 1989 traverse MeSH a term hierarchy for indexing articles in Medline and compute semantic relatedness straightforwardly in terms of the number of edges between terms in the hierarchy. Jarmasz Szpakowicz 2003 use the same approach with Ro-get s Thesaurus while Hirst St-Onge 1998 apply a similar strategy to WordNet. The API computes semantic relatedness by 1. taking a pair of words as input 2. retrieving the Wikipedia articles they refer to via a disambiguation strategy based on the link structure of the articles 3. computing paths in the Wikipedia categorization graph between the categories the articles are assigned to 4. returning as output the set of paths found scored according to some measure .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
31    248    0    25-04-2024
19    228    0    25-04-2024
46    187    0    25-04-2024
10    156    0    25-04-2024
15    183    0    25-04-2024
10    116    0    25-04-2024
75    137    0    25-04-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.