TAILIEUCHUNG - Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging"

In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an errordriven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. | An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging Canasai Kruengkrai i Kiyotaka UchimotoW Jun ichi Kazama Yiou Wang H Kentaro TorisawaW Hitoshi Isahara Graduate School of Engineering Kobe University 1-1 Rokkodai-cho Nada-ku Kobe 657-8501 Japan National Institute of Information and Communications Technology 3-5 Hikaridai Seika-cho Soraku-gun Kyoto 619-0289 Japan canasai uchimoto kazama wangyiou torisawa isahara @ Abstract In this paper we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm MIRA evaluate our approach on the Penn Chinese Treebank and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature. 1 Introduction In Chinese word segmentation and part-of-speech POS tagging are indispensable steps for higher-level NLP tasks. Word segmentation and POS tagging results are required as inputs to other NLP tasks such as phrase chunking dependency parsing and machine translation. Word segmentation and POS tagging in a joint process have received much attention in recent research and have shown improvements over a pipelined fashion Ng and Low 2004 Nakagawa and Uchimoto 2007 Zhang and Clark 2008 Jiang et al. 2008a Jiang et al. 2008b . In joint word segmentation and the POS tagging process one serious problem is caused by unknown words which are defined as words that are not found in a training corpus or in a sys tem s word dictionary1. The word .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.