TAILIEUCHUNG - Báo cáo khoa học: "Word representations: A simple and general method for semi-supervised learning"

If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. | Word representations A simple and general method for semi-supervised learning Joseph Turian Departement d Informatique et Recherche Operationnelle DIRO Universite de Montreal Montreal Quebec Canada H3T 1J4 lastname@ Lev Ratinov Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL 61801 ratinov2@ Yoshua Bengio Departement d Informatique et Recherche Operationnelle DIRO Universite de Montreal Montreal Quebec Canada H3T 1J4 bengioy@ Abstract If we take an existing supervised NLP system a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters Collobert and Weston 2008 embeddings and HLBL Mnih Hinton 2009 embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features for off-the-shelf use in existing NLP systems as well as our code here http metaoptimize. com projects wordreprs 1 Introduction By using unlabelled data to reduce data sparsity in the labeled training data semi-supervised approaches improve generalization accuracy. Semi-supervised models such as Ando and Zhang 2005 Suzuki and Isozaki 2008 and Suzuki et al. 2009 achieve state-of-the-art accuracy. However these approaches dictate a particular choice of model and training regime. It can be tricky and time-consuming to adapt an existing supervised NLP system to use these semi-supervised techniques. It is preferable to use a simple and general method to adapt existing supervised NLP systems to be semi-supervised. One approach that is becoming popular is to use unsupervised methods to induce word features or to download word features that have already been induced plug these word features into an existing system and observe a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.