TAILIEUCHUNG - Báo cáo khoa học: "Mistake-Driven Mixture of Hierarchical Tag Context Trees "

This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures: 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. 1 | Mistake-Driven Mixture of Hierarchical Tag Context Trees Masahiko Haruno NTT Communication Science Laboratories 1-1 Hikari-No-Oka Yokosuka-Shi Kanagawa 239 Japan Yuji Matsumoto NAIST 8916-5 Takayama-cho Ikoma-Shi Nara 630-01 Japan Abstract This paper proposes a mistake-driven mixture method for learning a tag model. The method iteratively performs two procedures 1. constructing a tag model based on the current data distribution and 2. updating the distribution by focusing on data that are not well predicted by the constructed model. The final tag model is constructed by mixing all the models according to their performance. To well reflect the data distribution we represent each tag model as a hierarchical tag proper noun noun context tree. By using the hierarchical tag context tree the constituents of sequential tag models gradually change from broad coverage tags . noun to specific exceptional words that cannot be captured by general tags. In other words the method incorporates not only frequent connections but also infrequent ones that are often considered to be collocational. We evaluate several tag models by implementing Japanese part-of-speech taggers that share all other conditions . dictionary and word model other than their tag models. The experimental results show the proposed method significantly outperforms both hand-crafted and conventional statistical methods. 1 Introduction The last few years have seen the great success of stochastic part-of-speech POS taggers Church 1988 Kupiec 1992 Charniak et al. 1993 Brill 1992 Nagata 1994 . The stochastic approach generally attains 94 to 96 accuracy and replaces the labor-intensive compilation of linguistics rules by using an automated learning algorithm. However 1NTT is an abbreviation of Nippon Telegraph and Telephone Corporation. practical systems require more accuracy because POS tagging is an inevitable pre-processing step for all practical

Tuấn Hải 55 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461992 55

Giới thiệu :Lập trình mã nguồn mở

14 23369 68

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11039 533

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10252 453

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9595 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8478 1141

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8314 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6901 257

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6361 1540

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 180 0 04-06-2024

Management and Services Part 1

10 176 0 04-06-2024

Bơm máy nén quạt trong công nghiệp part 8

20 221 3 04-06-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 166 0 04-06-2024

Đề tài: Tìm hiểu một số yêu cầu đặt ra với một phòng thu âm, để đảm bảo chất lượng âm thanh trong sản phẩm đa phương tiện

8 175 1 04-06-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 170 3 04-06-2024

MẪU GIẤY PHÉP VẬN TẢI LOẠI C

2 126 0 04-06-2024

A Practical Guide for Health Researchers - part 7

24 119 0 04-06-2024

MẪU CHỨNG CHỈ QUẢN LÝ VŨ KHÍ, VẬT LIỆU NỔ, CCHT

1 134 0 04-06-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 192 4 04-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6361 1540

Ebook Chào con ba mẹ đã sẵn sàng

112 3903 1281

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5521 1149

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8478 1141

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3587 662

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3789 570

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11039 533

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4230 527

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4241 483