TAILIEUCHUNG - Báo cáo khoa học: "A Practical Solution to the Problem of Automatic Part-of-Speech Induction from Text"

The problem of part-of-speech induction from text involves two aspects: Firstly, a set of word classes is to be derived automatically. Secondly, each word of a vocabulary is to be assigned to one or several of these word classes. In this paper we present a method that solves both problems with good accuracy. Our approach adopts a mixture of statistical methods that have been successfully applied in word sense induction. | A Practical Solution to the Problem of Automatic Part-of-Speech Induction from Text Reinhard Rapp University of Mainz FASK D-76711 Germersheim Germany rapp@ Abstract The problem of part-of-speech induction from text involves two aspects Firstly a set of word classes is to be derived automatically. Secondly each word of a vocabulary is to be assigned to one or several of these word classes. In this paper we present a method that solves both problems with good accuracy. Our approach adopts a mixture of statistical methods that have been successfully applied in word sense induction. Its main advantage over previous attempts is that it reduces the syntactic space to only the most important dimensions thereby almost eliminating the otherwise omnipresent problem of data sparseness. 1 Introduction Whereas most previous statistical work concerning parts of speech has been on tagging this paper deals with part-of-speech induction. In part-of-speech induction two phases can be distinguished In the first phase a set of word classes is to be derived automatically on the basis of the distribution of the words in a text corpus. These classes should be in accordance with human intuitions . common distinctions such as nouns verbs and adjectives are desirable. In the second phase based on its observed usage each word is assigned to one or several of the previously defined classes. The main reason why part-of-speech induction has received far less attention than part-of-speech tagging is probably that there seemed no urgent need for it as linguists have always considered classifying words as one of their core tasks and as a consequence accurate lexicons providing such information are readily available for many languages. Nevertheless deriving word classes automatically is an interesting intellectual challenge with relevance to cognitive science. Also advantages of the automatic systems are that they should be more objective and can provide precise .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.