TAILIEUCHUNG - Báo cáo khoa học: "Is Machine Translation Ripe for Cross-lingual Sentiment Classification"

Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. | Is Machine Translation Ripe for Cross-lingual Sentiment Classification Kevin Duh and Akinori Fujino and Masaaki Nagata NTT Communication Science Laboratories 2-4 Hikari-dai Seika-cho Kyoto 619-0237 JAPAN fuj @ Abstract Recent advances in Machine Translation MT have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources one can translate labeled data from another language then train a classifier on the translated text. This can be viewed as a domain adaptation problem where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece we take a step back and make some general statements about crosslingual adaptation problems. First we claim that domain mismatch is not caused by MT errors and accuracy degradation will occur even in the case of perfect MT. Second we argue that the cross-lingual adaptation problem is qualitatively different from other monolingual adaptation problems in NLP thus new adaptation algorithms ought to be considered. This paper will describe a series of carefully-designed experiments that led us to these conclusions. 1 Summary Question 1 If MT gave perfect translations semantically do we still have a domain adaptation challenge in cross-lingual sentiment classification Answer Yes. The reason is that while many translations of a word may be valid the MT system might have a systematic bias. For example the word awesome might be prevalent in English reviews but in 429 translated reviews the word excellent is generated instead. From the perspective of MT this translation is correct and preserves sentiment polarity. But from the perspective of a classifier there is a domain mismatch due to differences in word distributions. Question 2 Can we apply standard adaptation algorithms developed for other monolingual .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.