Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The NLP systems often have low performances because they rely on unreliable and heterogeneous knowledge. We show on the task of non-anaphoric it identification how to overcome these handicaps with the Bayesian Network (BN) formalism. The first results are very encouraging compared with the state-of-the-art systems. | Bayesian Network a model for NLP Davy Weissenbacher Laboratoire d Informatique de Paris-Nord Universite Paris-Nord Villetaneuse FRANCE davy.weissenbacher@lipn.univ-paris13.fr Abstract The NLP systems often have low performances because they rely on unreliable and heterogeneous knowledge. We show on the task of non-anaphoric it identification how to overcome these handicaps with the Bayesian Network BN formalism. The first results are very encouraging compared with the state-of-the-art systems. 1 Introduction When a pronoun refers to a linguistic expression previously introduced in the text it is anaphoric. In the sentence Nonexpression of the locus even when it is present suggests that these chromo-somes . the pronoun it refers to the referent designated as the locus . When it does not refer to any referent as in the sentence Thus it is not unexpected that this versatile cellular. the pronoun is semantically empty or non-anaphoric. Any anaphora resolution system starts by identifying the pronoun occurrences and distinguishing the anaphoric and non-anaphoric occurrences of it. The first systems that tackled this classification problem were based either on manually written rules or on the automatic learning of relevant surface clues. Whatever strategy is used these systems see their performances limited by the quality of knowledge they exploit which is usually only partially reliable and heterogeneous. This article describes a new approach to go beyond the limits of traditional systems. This approach stands on the formalism still little exploited for NLP of Bayesian Network BN . As a probabilistic formalism it offers a great expression capacity to integrate heterogeneous knowledge in a single representation Peshkin 2003 as well as an elegant mechanism to take into account an a priori estimation of their reliability in the classification decision Roth 2002 . In order to validate our approach we carried out various experiments on a corpus made up of abtsracts of .