TAILIEUCHUNG - Báo cáo khoa học: "The State of the Art in Thai Language Processing"

1 Some Problematic Issues in the Thai Processing It is obvious that the most fundamental semantic unit in a language is the word. Words are explicitly identified in those languages with word boundaries. In Thai, there is no word boundary. Thai words are implicitly recognized and in many cases, they depend on the individual judgement. This causes a lot of difficulties in the Thai language processing. | The State of the Art in Thai Language Processing Virach Sornlertlamvanich Tanapong Potipiti Chai Wutiwiwatchai and Pradit Mittrapiyanuruk National Electronics and Computer Technology Center NECTEC National Science and Technology Development Agency Ministry of Science and Technology Environment. 22nd Floor Gypsum Metropolitan Tower 539 2 Sriayudhya Rd. Rajthevi Bangkok 10400 Thailand. Email virach tanapong chai @ pmittrap@ Abstract This paper reviews the current state of technology and research progress in the Thai language processing. It resumes the characteristics of the Thai language and the approaches to overcome the difficulties in each processing task. 1 Some Problematic Issues in the Thai Processing It is obvious that the most fundamental semantic unit in a language is the word. Words are explicitly identified in those languages with word boundaries. In Thai there is no word boundary. Thai words are implicitly recognized and in many cases they depend on the individual judgement. This causes a lot of difficulties in the Thai language processing. To illustrate the problem we employed a classic English example. The segmentation of GODISNOWHERE . No. Segmentation Meaning 1 God is now here. God is here. 2 God is no where. God doesn t exist. 3 God is nowhere. God doesn t exist. With the different segmentations 1 and 2 have absolutely opposite meanings. 2 and 3 are ambiguous that nowhere is one word or two words. And the difficulty becomes greatly aggravated when unknown words exist. As a tonal language a phoneme with different tone has different meaning. Many unique approaches are introduced for both the tone generation in speech synthesis research and tone recognition in speech recognition research. These difficulties propagate to many levels in the language processing area such as lexical acquisition information retrieval machine translation speech processing etc. Furthermore the similar problem also occurs in the levels of sentence

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.