TAILIEUCHUNG - Báo cáo khoa học: "Part-of-Speech Tagging Considering Surface Form for an Agglutinative Language"

The previous probabilistic part-of-speech tagging models for agglutinative languages have considered only lexical forms of morphemes, not surface forms of words. This causes an inaccurate calculation of the probability. The proposed model is based on the observation that when there exist words (surface forms) that share the same lexical forms, the probabilities to appear are different from each other. Also, it is designed to consider lexical form of word. By experiments, we show that the proposed model outperforms the bigram Hidden Markov model (HMM)-based tagging model. based tagging model. . | Part-of-Speech Tagging Considering Surface Form for an Agglutinative Language Do-Gil Lee and Hae-Chang Rim Dept. of Computer Science Engineering Korea University 1 5-ka Anam-dong Seongbuk-ku Seoul 136-701 Korea dglee rim @ Abstract The previous probabilistic part-of-speech tagging models for agglutinative languages have considered only lexical forms of morphemes not surface forms of words. This causes an inaccurate calculation of the probability. The proposed model is based on the observation that when there exist words surface forms that share the same lexical forms the probabilities to appear are different from each other. Also it is designed to consider lexical form of word. By experiments we show that the proposed model outperforms the bigram Hidden Markov model HMM -based tagging model. 1 Introduction Part-of-speech POS tagging is a job to assign a proper POS tag to each linguistic unit such as word for a given sentence. In English POS tagging word is used as a linguistic unit. However the number of possible words in agglutinative languages such as Korean is almost infinite because words can be freely formed by gluing morphemes together. Therefore morpheme-unit tagging is preferred and more suitable in such languages than word-unit tagging. Figure 1 shows an example of morpheme structure of a sentence where the bold lines indicate the most likely morpheme-POS sequence. A solid line represents a transition between two morphemes across a word boundary and a dotted line represents a transition between two morphemes in a word. The previous probabilistic POS models for agglutinative languages have considered only lexical forms of morphemes not surface forms of words. This causes an inaccurate calculation of the probability. The proposed model is based on the observation that when there exist words surface forms that share the same lexical forms the probabilities to appear are different from each other. Also it is designed to consider lexical form of

Xuân Vân 94 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26653 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10566 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9854 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8518 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7912 1821

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7289 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 396 3 08-01-2025

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 287 4 08-01-2025

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 170 1 08-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 152 2 08-01-2025

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 176 2 08-01-2025

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 166 1 08-01-2025

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 165 1 08-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 217 7 08-01-2025

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 155 1 08-01-2025

The Ombudsman Enterprise and Administrative Justice

309 152 0 08-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7912 1821

Ebook Chào con ba mẹ đã sẵn sàng

112 4435 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6353 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3859 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4768 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4533 490