TAILIEUCHUNG - Báo cáo khoa học: "Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification"

Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. | Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification Zhaopeng Tu Yifan He Jennifer Foster Josef van Genabith Qun Liu Shouxun Lin Key Lab. of Intelligent Info. Processing Computer Science Department School of Computing Institute of Computing Technology CAS New York University Dublin City University i tuzhaopeng liuqun sxlin @ yhe@ 1 jfoster josef @ Abstract Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating combinations of sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus. 1 Introduction An important subtask in sentiment analysis is sentiment classification. Sentiment classification involves the identification of positive and negative opinions from a text segment at various levels of granularity including document-level paragraphlevel sentence-level and phrase-level. This paper focuses on document-level sentiment classification. There has been a substantial amount of work on document-level sentiment classification. In early pioneering work Pang and Lee 2004 use a flat feature vector . a bag-of-words to represent the documents. A bag-of-words approach however cannot capture important information obtained from structural linguistic analysis of the doc 338 uments. More recently there have been several approaches which employ features based on deep linguistic analysis with .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.