TAILIEUCHUNG - Báo cáo khoa học: "Data point selection for cross-language adaptation of dependency parsers"

We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. . | Data point selection for cross-language adaptation of dependency parsers Anders Sngaard Center for Language Technology University of Copenhagen Njalsgade 142 DK-2300 Copenhagen S soegaard@ Abstract We consider a very simple yet effective approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and state-of-the-art unsupervised dependency parsing and results are comparable to more complex projection-based cross language adaptation algorithms. 1 Introduction While unsupervised dependency parsing has seen rapid progress in recent years results are still far from the results that can be achieved with supervised parsers and not yet good enough to solve real-world problems. In this paper we will be interested in an alternative strategy namely cross-language adaptation of dependency parsers. The idea is briefly put to learn how to parse Arabic for example from say a Danish treebank comparing unlabeled data from both languages. This is similar to but more difficult than most domain adaptation or transfer learning scenarios where differences between source and target distributions are smaller. Most previous work in cross-language adaptation has used parallel corpora to project dependency 682 structures across translations using word alignments Smith and Eisner 2009 Spreyer and Kuhn 2009 Ganchev et al. 2009 but in this paper we show that similar results can be achieved by much simpler means. Specifically we build on the cross-language adaptation algorithm for closely related languages developed by .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.