TAILIEUCHUNG - POS-Tagger for English-Vietnamese Bilingual Corpus

Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (. Vietnamese) are at a deadlock due to absence of annotated training data for these languages. | HLT-NAACL 2003 Workshop Building and Using Parallel Texts Data Driven Machine Translation and Beyond pp. 88-95 Edmonton May-June 2003 POS-Tagger for English-Vietnamese Bilingual Corpus Dinh Dien Information Technology Faculty of Vietnam National University of HCMC 20 C2 Hoang Hoa Tham Ward 12 Tan Binh Dist. HCM City Vietnam ddien@ Hoang Kiem Center of Information Technology Development of Vietnam National University of HCMC 227 Nguyen Van Cu District 5 HCM City hkiem@ Abstract Corpus-based Natural Language Processing NLP tasks for such popular languages as English French etc. have been well studied with satisfactory achievements. In contrast corpus-based NLP tasks for unpopular languages . Vietnamese are at a deadlock due to absence of annotated training data for these languages. Furthermore hand-annotation of even reasonably well-determined features such as part-of-speech POS tags has proved to be labor intensive and costly. In this paper we suggest a solution to partially overcome the annotated resource shortage in Vietnamese by building a POS-tagger for an automatically word-aligned English-Vietnamese parallel Corpus named EVC . This POS-tagger made use of the Transformation-Based Learning or TBL method to bootstrap the POS-annotation results of the English POS-tagger by exploiting the POS-information of the corresponding Vietnamese words via their wordalignments in EVC. Then we directly project POS-annotations from English side to Vietnamese via available word alignments. This POS-annotated Vietnamese corpus will be manually corrected to become an annotated training data for Vietnamese NLP tasks such as POS-tagger Phrase-Chunker Parser Word-Sense Disambiguator etc. 1 Introduction POS-tagging is assigning to each word of a text the proper POS tag in its context of appearance. Although each word can be classified into various POS-tags in a defined context it can only be attributed with a definite POS. As an example in this sentence 2 can

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.