TAILIEUCHUNG - Báo cáo khoa học: "Semisupervised condensed nearest neighbor for part-of-speech tagging"

This paper introduces a new training set condensation technique designed for mixtures of labeled and unlabeled data. It finds a condensed set of labeled and unlabeled data points, typically smaller than what is obtained using condensed nearest neighbor on the labeled data only, and improves classification accuracy. We evaluate the algorithm on semisupervised part-of-speech tagging and present the best published result on the Wall Street Journal data set. | Semisupervised condensed nearest neighbor for part-of-speech tagging Anders Sngaard Center for Language Technology University of Copenhagen Njalsgade 142 DK-2300 Copenhagen S soegaard@ Abstract This paper introduces a new training set condensation technique designed for mixtures of labeled and unlabeled data. It finds a condensed set of labeled and unlabeled data points typically smaller than what is obtained using condensed nearest neighbor on the labeled data only and improves classification accuracy. We evaluate the algorithm on semisupervised part-of-speech tagging and present the best published result on the Wall Street Journal data set. 1 Introduction Labeled data for natural language processing tasks such as part-of-speech tagging is often in short supply. Semi-supervised learning algorithms are designed to learn from a mixture of labeled and unlabeled data. Many different semi-supervised algorithms have been applied to natural language processing tasks but the simplest algorithm namely self-training is the one that has attracted most attention together with expectation maximization Abney 2008 . The idea behind self-training is simply to let a model trained on the labeled data label the unlabeled data points and then to retrain the model on the mixture of the original labeled data and the newly labeled data. The nearest neighbor algorithm Cover and Hart 1967 is a memory-based or so-called lazy learning algorithm. It is one of the most extensively used nonparametric classification algorithms simple to implement yet powerful owing to its theoretical properties guaranteeing that for all distribu 48 tions its probability of error is bound by twice the Bayes probability of error Cover and Hart 1967 . Memory-based learning has been applied to a wide range of natural language processing tasks including part-of-speech tagging Daelemans et al. 1996 dependency parsing Nivre 2003 and word sense disambiguation Kubler and Zhekova 2009 . Memorybased learning .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.