Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources, one can translate labeled data from another language, then train a classifier on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch. | Is Machine Translation Ripe for Cross-lingual Sentiment Classification Kevin Duh and Akinori Fujino and Masaaki Nagata NTT Communication Science Laboratories 2-4 Hikari-dai Seika-cho Kyoto 619-0237 JAPAN kevin.duh fuj ino.akinori nagata.masaaki @lab.ntt.co.jp Abstract Recent advances in Machine Translation MT have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classifier for a language with no labeled resources one can translate labeled data from another language then train a classifier on the translated text. This can be viewed as a domain adaptation problem where labeled translations and test data have some mismatch. Various prior work have achieved positive results using this approach. In this opinion piece we take a step back and make some general statements about crosslingual adaptation problems. First we claim that domain mismatch is not caused by MT errors and accuracy degradation will occur even in the case of perfect MT. Second we argue that the cross-lingual adaptation problem is qualitatively different from other monolingual adaptation problems in NLP thus new adaptation algorithms ought to be considered. This paper will describe a series of carefully-designed experiments that led us to these conclusions. 1 Summary Question 1 If MT gave perfect translations semantically do we still have a domain adaptation challenge in cross-lingual sentiment classification Answer Yes. The reason is that while many translations of a word may be valid the MT system might have a systematic bias. For example the word awesome might be prevalent in English reviews but in 429 translated reviews the word excellent is generated instead. From the perspective of MT this translation is correct and preserves sentiment polarity. But from the perspective of a classifier there is a domain mismatch due to differences in word distributions. Question 2 Can we apply standard adaptation algorithms developed for other monolingual .