TAILIEUCHUNG - Báo cáo khoa học: "Co-Training for Cross-Lingual Sentiment Classification"

The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine translation services are used for eliminating the language gap between the training set and test set, and English features and Chinese features are considered as two independent views of the classification problem. . | Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan Institute of Compute Science and Technology Key Laboratory of Computational Linguistics MOE Peking University Beijing 100871 China wanxiaojun@ Abstract The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine translation services are used for eliminating the language gap between the training set and test set and English features and Chinese features are considered as two independent views of the classification problem. We propose a cotraining approach to making use of unlabeled Chinese data. Experimental results show the effectiveness of the proposed approach which can outperform the standard inductive classifiers and the transductive classifiers. 1 Introduction Sentiment classification is the task of identifying the sentiment polarity of a given text. The sentiment polarity is usually positive or negative and the text genre is usually product review. In recent years sentiment classification has drawn much attention in the NLP field and it has many useful applications such as opinion mining and summarization Liu et al. 2005 Ku et al. 2006 Titov and McDonald 2008 . To date a variety of corpus-based methods have been developed for sentiment classification. The methods usually rely heavily on an annotated corpus for training the sentiment classifier. The sentiment corpora are considered as the most valuable resources for the sentiment classification task. However such resources in different languages are very imbalanced. Because most previous work focuses on English sentiment classification many annotated corpora for English sentiment classification are .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.