Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. Our work views sentence compression as an optimisation problem. We develop an integer programming formulation and infer globally optimal compressions in the face of linguistically motivated constraints. We show that such a formulation allows for relatively simple and knowledge-lean compression models that do not require parallel corpora or largescale resources. The proposed approach yields results comparable and in some cases superior to state-of-the-art. . | Constraint-based Sentence Compression An Integer Programming Approach James Clarke and Mirella Lapata School of Informatics University of Edinburgh 2 Bucclecuch Place Edinburgh Eh8 9LW UK jclarke@ed.ac.uk mlap@inf.ed.ac.uk Abstract The ability to compress sentences while preserving their grammaticality and most of their meaning has recently received much attention. Our work views sentence compression as an optimisation problem. We develop an integer programming formulation and infer globally optimal compressions in the face of linguistically motivated constraints. We show that such a formulation allows for relatively simple and knowledge-lean compression models that do not require parallel corpora or large-scale resources. The proposed approach yields results comparable and in some cases superior to state-of-the-art. 1 Introduction A mechanism for automatically compressing sentences while preserving their grammaticality and most important information would greatly bene-ht a wide range of applications. Examples include text summarisation Jing 2000 subtitle generation from spoken transcripts Vandeghinste and Pan 2004 and information retrieval Olivers and Dolan 1999 . Sentence compression is a complex paraphrasing task with information loss involving substitution deletion insertion and reordering operations. Recent years have witnessed increased interest on a simpler instantiation of the compression problem namely word deletion Knight and Marcu 2002 Riezler et al. 2003 Turner and Char-niak 2005 . More formally given an input sentence of words W W1 w2 . wn a compression is formed by removing any subset of these words. Sentence compression has received both generative and discriminative formulations in the literature. Generative approaches Knight and Marcu 2002 Turner and Charniak 2005 are instantiations of the noisy-channel model given a long sentence l the aim is to hnd the corresponding short sentence s which maximises the conditional probability P s l . In a .