TAILIEUCHUNG - Báo cáo khoa học: "A New Statistical Approach to Chinese Pinyin Input"

Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also, to deal with real input, it also includes a typing model which enables spelling correction in sentence-based Pinyin input, and a spelling model for English which enables modeless Pinyin input. | A New Statistical Approach to Chinese Pinyin Input Zheng Chen Microsoft Research China No. 49 Zhichun Road Haidian District 100080 China zhengc@ Abstract Chinese input is one of the key challenges for Chinese PC users. This paper proposes a statistical approach to Pinyin-based Chinese input. This approach uses a trigram-based language model and a statistically based segmentation. Also to deal with real input it also includes a typing model which enables spelling correction in sentence-based Pinyin input and a spelling model for English which enables modeless Pinyin input. 1. Introduction Chinese input method is one of the most difficult problems for Chinese PC users. There are two main categories of Chinese input method. One is shape-based input method such as wu bi zi xing the other is Pinyin or pronunciation-based input method such as Chinese CStar MSPY etc. Because of its facility to learn and to use Pinyin is the most popular Chinese input method. Over 97 of the users in China use Pinyin for input Chen Yuan 1997 . Although Pinyin input method has so many advantages it also suffers from several problems including Pinyin-to-characters conversion errors user typing errors and UI problem such as the need of two separate mode while typing Chinese and English etc. Pinyin-based method automatically converts Pinyin to Chinese characters. But there are only about 406 syllables they correspond to over 6000 common Chinese characters. So it is very difficult for system to select the correct corresponding Chinese characters automatically. A higher accuracy Kai-Fu Lee Microsoft Research China No. 49 Zhichun Road Haidian District 100080 China kfl@ may be achieved using a sentence-based input. Sentence-based input method chooses character by using a language model base on context. So its accuracy is higher than wordbased input method. In this paper all the technology is based on sentence-based input method but it can easily adapted to word-input .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.