TAILIEUCHUNG - Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning"

This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative ‘condensed feature representations’ from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks while maintaining the state-ofthe-art performance provided by the recently developed high-performance semi-supervised learning technique. . | Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning Jun Suzuki Hideki Isozaki and Masaaki Nagata NTT Communication Science Laboratories NTT Corp. 2-4 Hikaridai Seika-cho Soraku-gun Kyoto 619-0237 Japan @ Abstract This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative condensed feature representations from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks while maintaining the state-of-the-art performance provided by the recently developed high-performance semi-supervised learning technique. Our method matches the results of current state-of-the-art systems with very few features . F-score with 344 features for CoNLL-2003 NER data and UAS with features for dependency parsing data derived from PTB-III. 1 Introduction In the last decade supervised learning has become a standard way to train the models of many natural language processing NLP systems. One simple but powerful approach for further enhancing the performance is to utilize a large amount of unsupervised data to supplement supervised data. Specifically an approach that involves incorporating clusteringbased word representations CWR induced from unsupervised data as additional features of supervised learning has demonstrated substantial performance gains over state-of-the-art supervised learning systems in typical NLP tasks such as named entity recognition Lin and Wu 2009 Turian et al. 2010 and dependency parsing Koo et al. 2008 . We refer to this approach as the iCWR approach The iCWR approach has become popular for enhancement because of its simplicity and generality. The goal of this paper is to provide yet another 636 simple and general .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.