TAILIEUCHUNG - Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging"

The large combined search space of joint word segmentation and Part-of-Speech (POS) tagging makes efficient decoding very hard. As a result, effective high order features representing rich contexts are inconvenient to use. In this work, we propose a novel stacked subword model for this task, concerning both efficiency and effectiveness. | A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging Weiwei Sun Department of Computational Linguistics Saarland University German Research Center for Artificial Intelligence DFKI D-66123 Saarbrucken Germany wsun@ Abstract The large combined search space of joint word segmentation and Part-of-Speech POS tagging makes efficient decoding very hard. As a result effective high order features representing rich contexts are inconvenient to use. In this work we propose a novel stacked subword model for this task concerning both efficiency and effectiveness. Our solution is a two step process. First one word-based segmenter one character-based segmenter and one local character classifier are trained to produce coarse segmentation and POS information. Second the outputs of the three predictors are merged into sub-word sequences which are further bracketed and labeled with POS tags by a fine-grained sub-word tagger. The coarse-to-fine search scheme is efficient while in the sub-word tagging step rich contextual features can be approximately derived. Evaluation on the Penn Chinese Treebank shows that our model yields improvements over the best system reported in the literature. 1 Introduction Word segmentation and part-of-speech POS tagging are necessary initial steps for more advanced Chinese language processing tasks such as parsing and semantic role labeling. Joint approaches that resolve the two tasks simultaneously have received much attention in recent research. Previous work has shown that joint solutions led to accuracy improvements over pipelined systems by avoiding segmentation error propagation and exploiting POS information to help segmentation. A challenge for joint approaches is the large combined search 1385 space which makes efficient decoding and structured learning of parameters very hard. Moreover the representation ability of models is limited since using rich contextual word features makes the search

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.