TAILIEUCHUNG - Báo cáo khoa học: "Bootstrapping Coreference Resolution Using Word Associations"

In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – ., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. | Bootstrapping Coreference Resolution Using Word Associations Hamidreza Kobdani Hinrich Schutze Michael Schiehlen and Hans Kamp Institute for Natural Language Processing University of Stuttgart kobdani@ Abstract In this paper we present an unsupervised framework that bootstraps a complete coreference resolution CoRe system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe - . the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available our unsupervised approach addresses the sparseness problem. In a self-training framework we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. 1 Introduction Coreference resolution CoRe is the process of finding markables noun phrases referring to the same real world entity or concept. Until recently most approaches tried to solve the problem by binary classification where the probability of a pair of markables being coreferent is estimated from labeled data. Alternatively a model that determines whether a markable is coreferent with a preceding cluster can be used. For both pair-based and cluster-based models a well established feature model plays an important role. Typical systems use a rich feature space based on lexical syntactic and semantic knowledge. Most 783 commonly used features are described by Soon et al. 2001 . Most existing systems are supervised systems trained on human-labeled benchmark data sets for English. These systems use linguistic features based on number gender person etc. It is a challenge to adapt these systems to new domains genres and languages because a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.