TAILIEUCHUNG - Báo cáo khoa học: "Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection"

This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies. | Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection Jinho D. Choi Department of Computer Science University of Colorado Boulder choijd@ Martha Palmer Department of Linguistics University of Colorado Boulder mpalmer@ Abstract This paper presents a novel way of improving POS tagging on heterogeneous data. First two separate models are trained generalized and domain-specific from the same data set by controlling lexical items with different document frequencies. During decoding one of the models is selected dynamically given the cosine similarity between each sentence and the training data. This dynamic model selection approach coupled with a one-pass left-to-right POS tagging algorithm is evaluated on corpora from seven different genres. Even with this simple tagging algorithm our system shows comparable results against other state-of-the-art systems and gives higher accuracies when evaluated on a mixture of the data. Furthermore our system is able to tag about 32K tokens per second. We believe that this model selection approach can be applied to more sophisticated tagging algorithms and improve their robustness even further. 1 Introduction When it comes to POS tagging two things must be checked. First a POS tagger needs to be tested for its robustness in handling heterogeneous Statistical POS taggers perform very well when their training and testing data are from the same source achieving over 97 tagging accuracy Toutanova et al. 2003 Gimenez and Marquez 2004 Shen et al. 2007 . However the performance degrades increasingly as the discrepancy between the training 1We use the term heterogeneous data as a mixture of data collected from several different sources. 363 and testing data gets larger. Thus to ensure robustness a tagger needs to be evaluated on several different kinds of data. Second a PO S tagger should be tested for its speed. PO S tagging is often performed as a pre-processing step to other tasks . parsing

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.