Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents three methods that can be used to recognize paraphrases. They all employ string similarity measures applied to shallow abstractions of the input sentences, and a Maximum Entropy classifier to learn how to combine the resulting features. Two of the methods also exploit WordNet to detect synonyms and one of them also exploits a dependency parser. We experiment on two datasets, the MSR paraphrasing corpus and a dataset that we automatically created from the MTC corpus. Our system achieves state of the art or better results. . | Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business Patission 76 GR-104 34 Athens Greece Abstract This paper presents three methods that can be used to recognize paraphrases. They all employ string similarity measures applied to shallow abstractions of the input sentences and a Maximum Entropy classifier to learn how to combine the resulting features. Two of the methods also exploit WordNet to detect synonyms and one of them also exploits a dependency parser. We experiment on two datasets the MSR paraphrasing corpus and a dataset that we automatically created from the MTC corpus. Our system achieves state of the art or better results. 1 Introduction Recognizing or generating semantically equivalent phrases is of significant importance in many natural language applications. In question answering for example a question may be phrased differently than in a document collection e.g. Who is the author of War and Peace vs. Leo Tolstoy is the writer of War and Peace. and taking such variations into account can improve system performance significantly Harabagiu et al. 2003 Harabagiu and Hickl 2006 . A paraphrase generator meaning a module that produces new phrases or patterns that are semantically equivalent or almost equivalant to a given input phrase or pattern e.g. X is the writer of Y o X wrote Y o Y was written by X o X is the author of Y or X produces Y o X manufactures Y o X is the manufacturer of Y can be used to produce alternative phrasings of the question before matching it against a document collection. Unlike paraphrase generators paraphrase recognizers decide whether or not two given phrases or patterns are paraphrases possibly by generalizing over many different training pairs of phrases. Paraphrase recognizers can be embedded in paraphrase generators to filter out erroneous generated paraphrases but they are also useful on their own. In .