TAILIEUCHUNG - Báo cáo khoa học: "Instance Weighting for Domain Adaptation in NLP"

Domain adaptation is an important problem in natural language processing (NLP) due to the lack of labeled data in novel domains. In this paper, we study the domain adaptation problem from the instance weighting perspective. We formally analyze and characterize the domain adaptation problem from a distributional view, and show that there are two distinct needs for adaptation, corresponding to the different distributions of instances and classification functions in the source and the target domains. . | Instance Weighting for Domain Adaptation in NLP Jing Jiang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL 61801 USA jiang4 czhai @ Abstract Domain adaptation is an important problem in natural language processing NLP due to the lack of labeled data in novel domains. In this paper we study the domain adaptation problem from the instance weighting perspective. We formally analyze and characterize the domain adaptation problem from a distributional view and show that there are two distinct needs for adaptation corresponding to the different distributions of instances and classification functions in the source and the target domains. We then propose a general instance weighting framework for domain adaptation. Our empirical results on three NLP tasks show that incorporating and exploiting more information from the target domain through instance weighting is effective. 1 Introduction Many natural language processing NLP problems such as part-of-speech POS tagging named entity NE recognition relation extraction and semantic role labeling are currently solved by supervised learning from manually labeled data. A bottleneck problem with this supervised learning approach is the lack of annotated data. As a special case we often face the situation where we have a sufficient amount of labeled data in one domain but have little or no labeled data in another related domain which we are interested in. We thus face the domain adaptation problem. Following Blitzer et al. 2006 we 264 call the first the source domain and the second the target domain. The domain adaptation problem is commonly encountered in NLP. For example in POS tagging the source domain may be tagged WSJ articles and the target domain may be scientific literature that contains scientific terminology. In NE recognition the source domain may be annotated news articles and the target domain may be personal blogs. Another example is personalized spam .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.