Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We aim to shed light on the state-of-the-art in NP coreference resolution by teasing apart the differences in the MUC and ACE task definitions, the assumptions made in evaluation methodologies, and inherent differences in text corpora. First, we examine three subproblems that play a role in coreference resolution: named entity recognition, anaphoricity determination, and coreference element detection. | Conundrums in Noun Phrase Coreference Resolution Making Sense of the State-of-the-Art Veselin Stoyanov Cornell University Ithaca NY ves@cs.cornell.edu Nathan Gilbert University of Utah Salt Lake City UT ngilbert@cs.utah.edu Claire Cardie Cornell University Ithaca NY cardie@cs.cornell.edu Ellen Riloff University of Utah Salt Lake City UT riloff@cs.utah.edu Abstract We aim to shed light on the state-of-the-art in NP coreference resolution by teasing apart the differences in the MUC and ACE task definitions the assumptions made in evaluation methodologies and inherent differences in text corpora. First we examine three subproblems that play a role in coreference resolution named entity recognition anaphoric-ity determination and coreference element detection. We measure the impact of each subproblem on coreference resolution and confirm that certain assumptions regarding these subproblems in the evaluation methodology can dramatically simplify the overall task. Second we measure the performance of a state-of-the-art coreference resolver on several classes of anaphora and use these results to develop a quantitative measure for estimating coreference resolution performance on new data sets. 1 Introduction As is common for many natural language processing problems the state-of-the-art in noun phrase NP coreference resolution is typically quantified based on system performance on manually annotated text corpora. In spite of the availability of several benchmark data sets e.g. MUC-6 1995 ACE NIST 2004 and their use in many formal evaluations as a field we can make surprisingly few conclusive statements about the state-of-the-art in NP coreference resolution. In particular it remains difficult to assess the effectiveness of different coreference resolution approaches even in relative terms. For example the 91.5 F-measure reported by McCallum and Wellner 2004 was produced by a system using perfect information for several linguistic subproblems. In contrast the 71.3 F-measure