TAILIEUCHUNG - Báo cáo khoa học: "Corpus-Based Identification of Non-Anaphoric N o u n Phrases"

Coreference resolution involves finding antecedents for anaphoric discourse entities, such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge (., "the White House" or "the news media"). We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric, which has the potential to improve the efficiency and accuracy of coreference resolution systems. Our algorithm generates lists of nonanaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. . | Corpus-Based Identification of Non-Anaphoric Noun Phrases David L. Bean and Ellen Riloff Department of Computer Science University of Utah Salt Lake City Utah 84112 bean r iloff @cs. Utah. edu Abstract Coreference resolution involves finding antecedents for anaphoric discourse entities such as definite noun phrases. But many definite noun phrases are not anaphoric because their meaning can be understood from general world knowledge . the White House or the news media . We have developed a corpus-based algorithm for automatically identifying definite noun phrases that are non-anaphoric which has the potential to improve the efficiency and accuracy of coreference resolution systems. Our algorithm generates lists of non-anaphoric noun phrases and noun phrase patterns from a training corpus and uses them to recognize non-anaphoric noun phrases in new texts. Using 1600 MƯC-4 terrorism news articles as the training corpus our approach achieved 78 recall and 87 precision at identifying such noun phrases in 50 test documents. 1 Introduction Most automated approaches to coreference resolution attempt to locate an antecedent for every potentially coreferent discourse entity DE in a text. The problem with this approach is that a large number of DE s may not have antecedents. While some discourse entities such as pronouns are almost always referential definite descriptions1 may not be. Earlier work found that nearly 50 of definite descriptions had no prior referents Vieira and Poesio 1997 and we found that number to be even higher 63 in our corpus. Some non-anaphoric definite descriptions can be identified by looking for syntactic clues like attached prepositional phrases or restrictive relative clauses. But other definite descriptions are non-anaphoric because readers understand their meaning due to common knowledge. For example readers of this 1In this work we define a definite description to be a noun phrase beginning with the. paper will probably understand the real .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.