Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Coreference resolution systems usually attempt to find a suitable antecedent for (almost) every noun phrase. Recent studies, however, show that many definite NPs are not anaphoric. The same claim, obviously, holds for the indefinites as well. In this study we try to learn automatically and two classifications, , relevant for this problem. We use a small training corpus (MUC-7), but also acquire some data from the Internet. Combining our classifiers sequentially, we achieve 88.9% precision and 84.6% recall for discourse new entities. . | High-precision Identification of Discourse New and Unique Noun Phrases Olga Uryupina Computational Linguistics Saarland University Building 17 Postfach 15 11 50 66041 Saarbrucken Germany ourioupi@coli.uni-sb.de Abstract Coreference resolution systems usually attempt to find a suitable antecedent for almost every noun phrase. Recent studies however show that many definite NPs are not anaphoric. The same claim obviously holds for the indefinites as well. In this study we try to learn automatically two classifications discourse-new and unique relevant for this problem. We use a small training corpus MUC-7 but also acquire some data from the Internet. Combining our classifiers sequentially we achieve 88.9 precision and 84.6 recall for discourse new entities. We expect our classifiers to provide a good prefiltering for coreference resolution systems improving both their speed and performance. 1 Introduction Most coreference resolution systems proceed in the following way they first identify all the possible markables for example noun phrases and then check one by one candidate pairs markablei markable-j trying to find out whether the members of those pairs can be coreferent. As the final step the pairs are ranked using a scoring algorithm in order to find an appropriate partition of all the markables into coreference classes. Those approaches require substantial processing in the worst case one has to check candi date pairs where is the total number of mark-ables found by the system. However R. Vieira and M. Poesio have recently shown in Vieira and Poesio 2000 that such an exhaustive search is not needed because many noun phrases are not anaphoric at all about 50 of definite NPs in their corpus have no prior referents. Obviously this number is even higher if one takes into account all the other types of NPs for example indefinites are almost always non-anaphoric. We can conclude that a coreference resolution engine might benefit a lot from a pre-filtering algorithm for