TAILIEUCHUNG - An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese
Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many other NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the techniques of two widely-used toolkits, ClearNLP and Stanford POS Tagger, and develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. | VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 11–25 An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese Nguyen Tuan Phong1 , Truong Quoc Tuan1 , Nguyen Xuan Nam1 , Le Anh Cuong2,∗ 1 Faculty of Information Technology, VNU University of Engineering and Technology, No. 144 Xuan Thuy Street, Dich Vong Ward, Cau Giay District, Hanoi, Vietnam 2 Faculty of Information Technology, Ton Duc Thang University, No. 19 Nguyen Huu Tho Street, Tan Phong Ward, District 7, Ho Chi Minh City, Vietnam Abstract Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many other NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the techniques of two widely-used toolkits, ClearNLP and Stanford POS Tagger, and develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We make a systematic comparison to find out the tagger having the best performance. We also design a new feature set to measure the performance of the statistical taggers. Our new taggers built from Stanford Tagger and ClearNLP with the new feature set can outperform all other current Vietnamese taggers in term of tagging accuracy. Moreover, we also analyze the affection of some features to the performance of statistical taggers. Lastly, the experimental results also reveal that the transformation-based tagger, RDRPOSTagger, can run faster than any statistical tagger significantly. Received March 2016, Revised May 2016, Accepted May 2016 Keywords: Part-of-speech tagger, Vietnamese. 1. Introduction languages such as English and French, studies in POS tagging are very successful. Recent studies for these languages [1-5] can yield state-of-the-art results at approximately 97-98% for overall accuracy. However, for less .
đang nạp các trang xem trước