Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper, we present a novel way of tackling the monolingual alignment problem on pairs of sentential paraphrases by means of edit rate computation. In order to inform the edit rate, information in the form of subsentential paraphrases is provided by a range of techniques built for different purposes. We show that the tunable TER-PLUS metric from Machine Translation evaluation can achieve good performance on this task and that it can effectively exploit information coming from complementary sources. . | Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs Houda Bouamor Aurelien Max Anne Vilnat LIMSI-CNRS Univ. Paris Sud Orsay France firstname.lastname @limsi.fr Abstract In this paper we present a novel way of tackling the monolingual alignment problem on pairs of sentential paraphrases by means of edit rate computation. In order to inform the edit rate information in the form of subsenten-tial paraphrases is provided by a range of techniques built for different purposes. We show that the tunable TER-PLUS metric from Machine Translation evaluation can achieve good performance on this task and that it can effectively exploit information coming from complementary sources. 1 Introduction The acquisition of subsentential paraphrases has attracted a lot of attention recently Madnani and Dorr 2010 . Techniques are usually developed for extracting paraphrase candidates from specific types of corpora including monolingual parallel corpora Barzi-lay and McKeown 2001 monolingual comparable corpora Deleger and Zweigenbaum 2009 bilingual parallel corpora Bannard and Callison-Burch 2005 and edit histories of multi-authored text Max and Wisniewski 2010 . These approaches face two main issues which correspond to the typical measures of precision or how appropriate the extracted paraphrases are and of recall or how many of the paraphrases present in a given corpus can be found effectively. To start with both measures are often hard to compute in practice as 1 the definition of what makes an acceptable paraphrase pair is still a research question and 2 it is often impractical to extract a complete set of acceptable paraphrases 395 from most resources. Second as regards the precision of paraphrase acquisition techniques in particular it is notable that most works on paraphrase acquisition are not based on direct observation of larger paraphrase pairs. Even monolingual corpora obtained by pairing very closely related texts such as news headlines on the same .