TAILIEUCHUNG - Báo cáo khoa học: "Supersense Tagging of Unknown Nouns using Semantic Similarity"

The limited coverage of lexical-semantic resources is a significant problem for NLP systems which can be alleviated by automatically classifying the unknown words. Supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into W ORD N ET. Ciaramita and Johnson (2003) present a tagger which uses synonym set glosses as annotated training examples. We describe an unsupervised approach, based on vector-space similarity, which does not require annotated examples but significantly outperforms their tagger. . | Supersense Tagging of Unknown Nouns using Semantic Similarity James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia james@ Abstract The limited coverage of lexical-semantic resources is a significant problem for NLP systems which can be alleviated by automatically classifying the unknown words. Supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into WordNet. Ciaramita and Johnson 2003 present a tagger which uses synonym set glosses as annotated training examples. We describe an unsupervised approach based on vector-space similarity which does not require annotated examples but significantly outperforms their tagger. We also demonstrate the use of an extremely large shallow-parsed corpus for calculating vector-space semantic similarity. 1 Introduction Lexical-semantic resources have been applied successful to a wide range of Natural Language Processing NLP problems ranging from collocation extraction Pearce 2001 and class-based smoothing Clark and Weir 2002 to text classification Baker and McCallum 1998 and question answering Pasca and Harabagiu 2001 . In particular WORDNET Fellbaum 1998 has significantly influenced research in NLP. Unfortunately these resource are extremely timeconsuming and labour-intensive to manually develop and maintain requiring considerable linguistic and domain expertise. Lexicographers cannot possibly keep pace with language evolution sense distinctions are continually made and merged words are coined or become obsolete and technical terms migrate into the vernacular. Technical domains such as medicine require separate treatment since common words often take on special meanings and a significant proportion of their vocabulary does not overlap with everyday vocabulary. Bur-gun and Bodenreider 2001 compared an alignment of WORDNET with the UMLS medical resource and found only a very small degree of overlap. .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.