TAILIEUCHUNG - Báo cáo khoa học: "Dictionary Definitions based Homograph Identification using a Generative Hierarchical Model"

A solution to the problem of homograph (words with multiple distinct meanings) identification is proposed and evaluated in this paper. It is demonstrated that a mixture model based framework is better suited for this task than the standard classification algorithms – relative improvement of 7% in F1 measure and 14% in Cohen’s kappa score is observed. | Dictionary Definitions based Homograph Identification using a Generative Hierarchical Model Anagha Kulkarni Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave Pittsburgh Pa 15213 USA anaghak callan @ Abstract A solution to the problem of homograph words with multiple distinct meanings identification is proposed and evaluated in this paper. It is demonstrated that a mixture model based framework is better suited for this task than the standard classification algorithms -relative improvement of 7 in F1 measure and 14 in Cohen s kappa score is observed. 1 Introduction Lexical ambiguity resolution is an important research problem for the fields of information retrieval and machine translation Sanderson 2000 Chan et al. 2007 . However making fine-grained sense distinctions for words with multiple closely-related meanings is a subjective task Jorgenson 1990 Palmer et al. 2005 which makes it difficult and error-prone. Fine-grained sense distinctions aren t necessary for many tasks thus a possibly-simpler alternative is lexical disambiguation at the level of homographs Ide and Wilks 2006 . Homographs are a special case of semantically ambiguous words Words that can convey multiple distinct meanings. For example the word bark can imply two very different concepts - outer layer of a tree trunk or the sound made by a dog and thus is a homograph. Ironically the definition of the word homograph is itself ambiguous and much debated however in this paper we consistently use the above definition. If the goal is to do word-sense disambiguation of homographs in a very large corpus a manually-generated homograph inventory may be impractical. In this case the first step is to determine which words in a lexicon are homographs. This problem is the subject of this paper. 2 Finding the Homographs in a Lexicon Our goal is to identify the homographs in a large lexicon. We assume that manual labor is a scarce resource

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.