Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper, we suggest a method for incorporating domain knowledge in semi-supervised learning algorithms. Our novel framework unifies and can exploit several kinds of task specific constraints. . | Guiding Semi-Supervision with Constraint-Driven Learning Ming-Wei Chang Lev Ratinov Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL 61801 mchang21 ratinov2 danr @uiuc.edu Abstract Over the last few years two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classihers when the labeled data is scarce and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper we suggest a method for incorporating domain knowledge in semi-supervised learning algorithms. Our novel framework unihes and can exploit several kinds of task specific constraints. The experimental results presented in the information extraction domain demonstrate that applying constraints helps the model to generate better feedback during learning and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks. 1 Introduction Natural Language Processing NLP systems typically require large amounts of knowledge to achieve good performance. Acquiring labeled data is a dif-hcult and expensive task. Therefore an increasing attention has been recently given to semi-supervised learning where large amounts of unlabeled data are used to improve the models learned from a small training set Collins and Singer 1999 Thelen and Riloff 2002 . The hope is that semi-supervised or even unsupervised approaches when given enough 280 knowledge about the structure of the problem will be competitive with the supervised models trained on large training sets. However in the general case semi-supervised approaches give mixed results and sometimes even degrade the model performance Nigam et al. 2000 . In many cases improving semi-supervised models was done by seeding these models with domain information taken from dictionaries or ontology Cohen and Sarawagi .