TAILIEUCHUNG - Báo cáo khoa học: "A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context"

We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. . | A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context Masaaki NAGATA NTT Cyber Space Laboratories 1-1 Hikari-no-oka Yokosuka-Shi Kanagawa 239-0847 Japan Abstract We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese kanji and phonograms like English katakana . Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. The model can achieve tagging accuracy if unknown words are correctly segmented. 1 Introduction In Japanese around 95 word segmentation accuracy is reported by using a word-based language model and the Viterbi-like dynamic programming procedures Nagata 1994 Yamamoto 1996 Takeuchi and Matsumoto 1997 Haruno and Matsumoto 1997 . About the same accuracy is reported in Chinese by statistical methods Sproat et al. 1996 . But there has been relatively little improvement in recent years because most of the remaining errors are due to unknown words. There are two approaches to solve this problem to increase the coverage of the dictionary Fung and Wu 1994 Chang et al. 1995 Mori and Nagao 1996 and to design a better model for unknown words Nagata 1996 Sproat et al. 1996 . We take the latter approach. To improve word segmentation accuracy Nagata 1996 used a single general purpose unknown word model while Sproat et al. 1996 used a set of specific word models such as for plurals personal names and transliterated foreign words. The goal of our research is to assign a correct part of speech to unknown word as well as identifying it correctly. In this paper we present a novel statistical model for Japanese unknown words. It .

Tùng Minh 65 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

A formula to calculate pruning threshold for the part of speech tagging problem

10 65 0

Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition - Part 2 (Daniel Jurafsky, James H. Martin)

336 32 1

Ebook Speech and Language Processing: An introduction to natural language processing - Part 1

509 45 2

Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition - Part 1 (Daniel Jurafsky, James H. Martin)

287 41 1

Báo cáo khoa học: "Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection"

5 79 0

Báo cáo khoa học: "Part-of-Speech Implications of Affixes"

6 45 0

Báo cáo khoa học: "A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors"

10 49 0

Báo cáo khoa học: "Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging"

11 83 0

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging"

10 57 0

Báo cáo khoa học: "Simple semi-supervised training of part-of-speech taggers"

4 58 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 223 4 23-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 157 1 23-12-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 164 1 23-12-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 204 7 23-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 146 1 23-12-2024

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 144 1 23-12-2024

Xinh xinh vườn nhà

6 131 0 23-12-2024

Determini prounoun 1

6 139 0 23-12-2024

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 175 0 23-12-2024

Báo cáo khoa học: "A rare coexistence of adrenal cavernous hemangioma with extramedullar hemopoietic tissue: a case report and brief review of the literature"

4 106 0 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490