Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper, we propose guided learning, a new learning framework for bidirectional sequence classification. The tasks of learning the order of inference and training the local classifier are dynamically incorporated into a single Perceptron like learning algorithm. We apply this novel learning algorithm to POS tagging. It obtains an error rate of 2.67% on the standard PTB test set, which represents 3.3% relative error reduction over the previous best result on the same data set, while using fewer features. . | Guided Learning for Bidirectional Sequence Classification Libin Shen BBN Technologies Cambridge MA 02138 USA lshen@bbn.com Giorgio Satta Dept. of Inf. Eng g. University of Padua I-35131 Padova Italy satta@dei.unipd.it Aravind K. Joshi Department of CIS University of Pennsylvania Philadelphia PA 19104 USA joshi@seas.upenn.edu Abstract In this paper we propose guided learning a new learning framework for bidirectional sequence classification. The tasks of learning the order of inference and training the local classifier are dynamically incorporated into a single Perceptron like learning algorithm. We apply this novel learning algorithm to POS tagging. It obtains an error rate of 2.67 on the standard PTB test set which represents 3.3 relative error reduction over the previous best result on the same data set while using fewer features. 1 Introduction Many NLP tasks can be modeled as a sequence classification problem such as POS tagging chunking and incremental parsing. A traditional method to solve this problem is to decompose the whole task into a set of individual tasks for each token in the input sequence and solve these small tasks in a fixed order usually from left to right. In this way the output of the previous small tasks can be used as the input of the later tasks. HMM and MaxEnt Markov Model are examples of this method. Lafferty et al. 2001 showed that this approach suffered from the so called label bias problem Bot-tou 1991 . They proposed Conditional Random Fields CRF as a general solution for sequence classification. CRF models a sequence as an undirected graph which means that all the individual tasks are solved simultaneously. Taskar et al. 2003 improved the CRF method by employing the large margin method to separate the gold standard sequence la-760 beling from incorrect labellings. However the complexity of quadratic programming for the large margin approach prevented it from being used in large scale NLP tasks. Collins 2002 proposed a Perceptron like