Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Conditional Random Fields (CRFs) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linearchain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a 1 penalty term. . | Practical very large scale CRFs Thomas Lavergne LIMSI - CNRS lavergne@limsi.fr Olivier Cappe Telecom ParisTech LTCI - CNRS cappe@enst.fr Francois Yvon Universite Paris-Sud 11 LIMSI - CNRS yvon@limsi.fr Abstract Conditional Random Fields CRFs are a widely-used approach for supervised sequence labelling notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linear-chain model taking structure into account implies a number of parameters and a computational effort that grows quadrati-cally with the cardinality of the label set. In this paper we address the issue of training very large CRFs containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a c penalty term. Based on our own implementation we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy while delivering compact parameter sets. 1 Introduction Conditional Random Fields CRFs Lafferty et al. 2001 Sutton and McCallum 2006 constitute a widely-used and effective approach for supervised structure learning tasks involving the mapping between complex objects such as strings and trees. An important property of CRFs is their ability to handle large and redundant feature sets and to integrate structural dependency between output labels. However even for simple linear chain CRFs the complexity of learning and inference This work was partly supported by ANR projects CroTaL ANR-07-MDCO-003 and MGA ANR-07-BLAN-0311-02 . grows quadratically with respect to the number of output labels and so does the number of structural features ie. features testing adjacent pairs of labels. Most empirical studies on CRFs thus either consider tasks with a restricted output space typically in the order of few dozens of output .