Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient m e t h o d to construct part-of-speech tagged corpus. A rulebased error correction m e t h o d is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feedback. Experiments were carried out to show the efficiency of error correction process of this workbench. The result shows that about 63.2 % of tagging errors can be corrected. . | Machine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging Junsik Park Jung-Goo Kang Wook Hur and Key-Sun Choi Center for Artificial Intelligence Research Korea Advanced Institute of Science and Technology Taejon 305-701 Korea j spark j gkang hook kschoi world. kaist. ac. kr Abstract Statistical methods require very large corpus with high quality. But building large and faultless annotated corpus is a very difficult job. This paper proposes an efficient method to construct part-of-speech tagged corpus. A rulebased error correction method is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user s correction log to reflect feedback. Experiments were carried out to show the efficiency of error correction process of this workbench. The result shows that about 63.2 of tagging errors can be corrected. 1 Introduction Natural language processing system using corpus needs the large amount of corpus Choi et al. 1994 but it also requires the high quality. The process of making the general annotated corpus can be viewed as Figure 1. There are some difficulties in processing the annotated corpus. First the number of items in a dictionary is not so large. The second problem is in the difficulty of modifying the errors produced by automatic tagging. Manual error correction would require large amount of costs and there may still remain errors after correcting process. There were also researches about automatic correction but they had problems about the sideeffects after automatic error correction Lee and Lee 1996 Lim et al. 1996 . In this paper we will integrate the morphological analysis and tagging and provide interactive user interface. User gives the feedback to resolve the ambiguities of analysis. To reduce the cost and improve the correctness we have developed an environment which is enable to find errors and modify them. In the following section related works are described. In .