TAILIEUCHUNG - Báo cáo khoa học: "Large Scale Collocation Data and Their Application to Japanese Word Processor Technology"

Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. . | Large Scale Collocation Data and Their Application to Japanese Word Processor Technology Yasuo Koyama Masako Yasutake Kenji Yoshimura and Kosho Shudo Institute for Information and Control Systems Fukuoka University Nanakuma Fukuoka 814-0180 Japan koyama@ yasutake@ yosimura@ shudo@ abstract Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana phonetic character to Kanji ideographic Chinese character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing since we have so many homophonic Kanjis. In this paper we report the results of our Kana-to-Kanji conversion experiments which embody the homophone processing based on large scale collocation data. It is shown that approximately 135 000 collocations yield raise of the conversion accuracy compared with the prototype system which has no collocation data. 1. Introduction Word processors or computers used in Japan ordinarily employ Japanese input method through keyboard stroke combined with Kana phonetic to Kanji ideographic Chinese character conversion technology. The Kana-to-Kanji conversion is performed by the morphological analysis on the input Kana string with no space between words. Word- or phrase-segmentation is carried out by the analysis to identify the substring of the input which has to be converted from Kana to Kanji. Kana-Kanji mixed string which is the ordinary form of Japanese written text is obtained as the final result. The major issue of this technology lies in raising the accuracy of the segmentation and the homophone processing to select the correct Kanji among many homophonic candidates. The conventional methodology for processing homophones have used the function that gives the priority to the word which was used lastly or to the high frequency word. In

Khởi Phong 76 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Developing Large Web Applications

302 63 0

Applying the noncooperative game model for compensation concept in contractor selection process for large-scale projects

13 23 1

Taxonomic assignment for large scale metagenomic data on high perfomance systems

12 62 0

Study to fabricate the large scale buckypaper based on carbon nanotubes

8 61 0

A study in vibration of a large scale hydraulic cylinder actuator via numerical simulation

5 86 0

New results on finite time stability for nonlinear fractional order large scale systems with time varying delay and interconnections

6 102 0

An approach to the large-scale integration of wind energy in Albania

17 41 1

Robust finite - time supoptimal control of large scale systems with interacted state and control delays

15 47 3

Large scale mimo MC-CDMA system using combined multiple beamforming and spatial multiplexing

11 89 0

2SigFinder: The combined use of smallscale and large-scale statistical testing for genomic island detection from a single genome

15 66 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462342 61

Giới thiệu :Lập trình mã nguồn mở

14 26079 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10552 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9843 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8506 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7271 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 279 4 28-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 164 1 28-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 146 2 28-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 28-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 161 1 28-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 154 4 28-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 148 1 28-12-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 141 1 28-12-2024

Xinh xinh vườn nhà

6 131 0 28-12-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 153 1 28-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Ebook Chào con ba mẹ đã sẵn sàng

112 4409 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6292 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3842 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4712 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4510 490