TAILIEUCHUNG - Báo cáo khoa học: "Guessing Parts-of-Speech of Unknown Words Using Global Information"

In this paper, we present a method for guessing POS tags of unknown words using local and global information. Although many existing methods use only local information (. limited window size or intra-sentential features), global information (extra-sentential features) provides valuable clues for predicting POS tags of unknown words. We propose a probabilistic model for POS guessing of unknown words using global information as well as local information, and estimate its parameters using Gibbs sampling. We also attempt to apply the model to semisupervised learning, and conduct experiments on multiple corpora. . | Guessing Parts-of-Speech of Unknown Words Using Global Information Tetsuji Nakagawa Corporate R D Center Oki Electric Industry Co. Ltd. 2 5 7 Honmachi Chuo-ku Osaka 541 0053 Japan nakagawa378@ Abstract In this paper we present a method for guessing POS tags of unknown words using local and global information. Although many existing methods use only local information . limited window size or intra-sentential features global information extra-sentential features provides valuable clues for predicting POS tags of unknown words. We propose a probabilistic model for POS guessing of unknown words using global information as well as local information and estimate its parameters using Gibbs sampling. We also attempt to apply the model to semisupervised learning and conduct experiments on multiple corpora. 1 Introduction Part-of-speech POS tagging is a fundamental language analysis task. In POS tagging we frequently encounter words that do not exist in training data. Such words are called unknown words. They are usually handled by an exceptional process in POS tagging because the tagging system does not have information about the words. Guessing the POS tags of such unknown words is a difficult task. But it is an important issue both for conducting POS tagging accurately and for creating word dictionaries automatically or semi-automatically. There have been many studies on POS guessing of unknown words Mori and Na-gao 1996 Mikheev 1997 Chen et al. 1997 Nagata 1999 Orphanos and Christodoulakis 1999 . In most of these previous works POS tags of unknown words were predicted using only local information such as lexical forms and POS tags of surrounding words or word-internal features . suffixes and character types of the unknown words. However this approach has limitations in available information. For example common nouns and proper nouns are sometimes difficult to distinguish with only the information of a single occurrence because their syntactic functions are .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.