Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a noun phrase coreference system that extends the work of Soon et al. (2001) and, to our knowledge, produces the best results to date on the MUC6 and MUC-7 coreference resolution data sets — F-measures of 70.4 and 63.4, respectively. Improvements arise from two sources: extra-linguistic changes to the learning framework and a large-scale expansion of the feature set to include more sophisticated linguistic knowledge. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 104-111. Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Department of Computer Science Cornell University Ithaca NY 14853-7501 yung cardie @cs.cornell.edu Abstract We present a noun phrase coreference system that extends the work of Soon et al. 2001 and to our knowledge produces the best results to date on the MUC-6 and MUC-7 coreference resolution data sets F-measures of 70.4 and 63.4 respectively. Improvements arise from two sources extra-linguistic changes to the learning framework and a large-scale expansion of the feature set to include more sophisticated linguistic knowledge. 1 Introduction Noun phrase coreference resolution refers to the problem of determining which noun phrases NPs refer to each real-world entity mentioned in a document. Machine learning approaches to this problem have been reasonably successful operating primarily by recasting the problem as a classification task e.g. Aone and Bennett 1995 McCarthy and Lehnert 1995 . Specifically a pair of NPs is classified as co-referring or not based on constraints that are learned from an annotated corpus. A separate clustering mechanism then coordinates the possibly contradictory pairwise classifications and constructs a partition on the set of NPs. Soon et al. 2001 for example apply an NP coreference system based on decision tree induction to two standard coreference resolution data sets MUC-6 1995 MUC-7 1998 achieving performance comparable to the best-performing knowledge-based coreference engines. Perhaps surprisingly this was accomplished in a decidedly knowledge-lean manner the learning algorithm has access to just 12 surface-level features. This paper presents an NP coreference system that investigates two types of extensions to the Soon et al. corpus-based approach. First we propose and evaluate three extra-linguistic modifications