Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

We propose a cascaded linear model for joint Chinese word segmentation and partof-speech tagging. With a character-based perceptron as the core, combined with realvalued features such as language models, the cascaded model is able to efficiently utilize knowledge sources that are inconvenient to incorporate into the perceptron directly. Experiments show that the cascaded model achieves improved accuracies on both segmentation only and joint segmentation and part-of-speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5% on segmentation and 12% on joint segmentation and part-of-speech tagging over the perceptron-only baseline. . | A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging Wenbin Jiang 1 Liang Huang Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704 Beijing 100190 China jiangwenbin@ict.ac.cn Qun Liu 1 Yajuan Lu 1 Department of Computer Information Science University of Pennsylvania Levine Hall 3330 Walnut Street Philadelphia PA 19104 USA lhuang3@cis.upenn.edu Abstract We propose a cascaded linear model for joint Chinese word segmentation and part-of-speech tagging. With a character-based perceptron as the core combined with realvalued features such as language models the cascaded model is able to efficiently utilize knowledge sources that are inconvenient to incorporate into the perceptron directly. Experiments show that the cascaded model achieves improved accuracies on both segmentation only and joint segmentation and part-of-speech tagging. On the Penn Chinese Treebank 5.0 we obtain an error reduction of 18.5 on segmentation and 12 onjoint segmentation and part-of-speech tagging over the perceptron-only baseline. 1 Introduction Word segmentation and part-of-speech POS tagging are important tasks in computer processing of Chinese and other Asian languages. Several models were introduced for these problems for example the Hidden Markov Model HMM Rabiner 1989 Maximum Entropy Model ME Ratnaparkhi and Adwait 1996 and Conditional Random Fields CRFs Lafferty et al. 2001 . CRFs have the advantage of flexibility in representing features compared to generative ones such as HMM and usually behaves the best in the two tasks. Another widely used discriminative method is the perceptron algorithm Collins 2002 which achieves comparable performance to CRFs with much faster training so we base this work on the perceptron. To segment and tag a character sequence there are two strategies to choose performing POS tagging following segmentation or joint segmentation and POS tagging Joint S T . .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.