TAILIEUCHUNG - Báo cáo khoa học: "Effective Measures of Domain Similarity for Parsing"

It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. | Effective Measures of Domain Similarity for Parsing Barbara Plank University of Groningen The Netherlands Gertjan van Noord University of Groningen The Netherlands Abstract It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain Sekine 1997 Gildea 2001 . Hence an important task is to select appropriate domains. However most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. As more and more data becomes available automatic ways to select data that is beneficial for a new unknown target domain are becoming attractive. This paper evaluates various ways to automatically acquire related training data for a given test set. The results show that an unsupervised technique based on topic models is effective - it outperforms random data selection on both languages examined English and Dutch. Moreover the technique works better than manually assigned labels gathered from meta-data that is available for English. 1 Introduction and Motivation Previous research on domain adaptation has focused on the task of adapting a system trained on one domain say newspaper text to a particular new domain say biomedical data. Usually some amount of labeled or unlabeled data from the new domain was given - which has been determined by a human. However with the growth of the web more and more data is becoming available where each document is potentially its own domain McClosky et al. 2010 . It is not straightforward to determine 1566 which data or model in case we have several source domain models will perform best on a new unknown target domain. Therefore an important issue that arises is how to measure domain similarity . whether we can find a simple yet effective method to determine which model or data is most beneficial for an arbitrary piece of new text. .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.