TAILIEUCHUNG - Báo cáo khoa học: "Frustratingly Easy Domain Adaptation"

We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough “target” data to do slightly better than just using only “source” data. Our approach is incredibly simple, easy to implement as a preprocessing step (10 lines of Perl!) and outperforms stateof-the-art approaches on a range of datasets. Moreover, it is trivially extended to a multidomain adaptation problem, where one has data from a variety of different domains. | Frustratingly Easy Domain Adaptation Hal Daume III School of Computing University of Utah Salt Lake City Utah 84112 me@ Abstract We describe an approach to domain adaptation that is appropriate exactly in the case when one has enough target data to do slightly better than just using only source data. Our approach is incredibly simple easy to implement as a preprocessing step 10 lines of Perl and outperforms state-of-the-art approaches on a range of datasets. Moreover it is trivially extended to a multidomain adaptation problem where one has data from a variety of different domains. 1 Introduction The task of domain adaptation is to develop learning algorithms that can be easily ported from one domain to another say from newswire to biomedical documents. This problem is particularly interesting in NLP because we are often in the situation that we have a large collection of labeled data in one source domain say newswire but truly desire a model that performs well in a second target domain. The approach we present in this paper is based on the idea of transforming the domain adaptation learning problem into a standard supervised learning problem to which any standard algorithm may be applied eg. maxent SVMs etc. . Our transformation is incredibly simple we augment the feature space of both the source and target data and use the result as input to a standard learning algorithm. There are roughly two varieties of the domain adaptation problem that have been addressed in the literature the fully supervised case and the semi- 256 supervised case. The fully supervised case models the following scenario. We have access to a large annotated corpus of data from a source domain. In addition we spend a little money to annotate a small corpus in the target domain. We want to leverage both annotated datasets to obtain a model that performs well on the target domain. The semisupervised case is similar but instead of having a small annotated target corpus we have a large .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.