TAILIEUCHUNG - Báo cáo khoa học: "High-precision Identification of Discourse New and Unique Noun Phrases"

Coreference resolution systems usually attempt to find a suitable antecedent for (almost) every noun phrase. Recent studies, however, show that many definite NPs are not anaphoric. The same claim, obviously, holds for the indefinites as well. In this study we try to learn automatically and two classifications, , relevant for this problem. We use a small training corpus (MUC-7), but also acquire some data from the Internet. Combining our classifiers sequentially, we achieve precision and recall for discourse new entities. . | High-precision Identification of Discourse New and Unique Noun Phrases Olga Uryupina Computational Linguistics Saarland University Building 17 Postfach 15 11 50 66041 Saarbrucken Germany ourioupi@ Abstract Coreference resolution systems usually attempt to find a suitable antecedent for almost every noun phrase. Recent studies however show that many definite NPs are not anaphoric. The same claim obviously holds for the indefinites as well. In this study we try to learn automatically two classifications discourse-new and unique relevant for this problem. We use a small training corpus MUC-7 but also acquire some data from the Internet. Combining our classifiers sequentially we achieve precision and recall for discourse new entities. We expect our classifiers to provide a good prefiltering for coreference resolution systems improving both their speed and performance. 1 Introduction Most coreference resolution systems proceed in the following way they first identify all the possible markables for example noun phrases and then check one by one candidate pairs markablei markable-j trying to find out whether the members of those pairs can be coreferent. As the final step the pairs are ranked using a scoring algorithm in order to find an appropriate partition of all the markables into coreference classes. Those approaches require substantial processing in the worst case one has to check candi date pairs where is the total number of mark-ables found by the system. However R. Vieira and M. Poesio have recently shown in Vieira and Poesio 2000 that such an exhaustive search is not needed because many noun phrases are not anaphoric at all about 50 of definite NPs in their corpus have no prior referents. Obviously this number is even higher if one takes into account all the other types of NPs for example indefinites are almost always non-anaphoric. We can conclude that a coreference resolution engine might benefit a lot from a pre-filtering algorithm for

TÀI LIỆU MỚI ĐĂNG
10    179    3    28-12-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.