TAILIEUCHUNG - Báo cáo khoa học: "Named Entity Transliteration with Comparable Corpora"

In this paper we investigate ChineseEnglish name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs | Named Entity Transliteration with Comparable Corpora Richard Sproat Tao Tao ChengXiang Zhai University of Illinois at Urbana-Champaign Urbana IL 61801 rws@ taotao czhai @ Abstract In this paper we investigate Chinese-English name transliteration using comparable corpora corpora where texts in the two languages deal in some of the same topics and therefore share references to named entities but are not translations of each other. We present two distinct methods for transliteration one approach using phonetic transliteration and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step. 1 Introduction As part of a more general project on multilingual named entity identification we are interested in the problem of name transliteration across languages that use different scripts. One particular issue is the discovery of named entities in comparable texts in multiple languages where by comparable we mean texts that are about the same topic but are not in general translations of each other. For example if one were to go through an English Chinese and Arabic newspaper on the same day it is likely that the more important international events in various topics such as politics business science and sports would each be covered in each of the newspapers. Names of the same persons locations and so forth which are often transliterated rather than translated would be found in comparable stories across the three We wish to use this expectation to leverage transliteration and thus the identification of named entities across languages. Our idea is that the occurrence of a cluster of names in say an English text should

Ðức Phú 55 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

VLSP shared task: Named entity recognition

12 87 1

Multitask learning for biomedical named entity recognition with cross-sharing structure

13 41 1

Báo cáo khoa học: "Exploring Entity Relations for Named Entity Disambiguation"

6 36 0

FamPlex: A resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining

14 41 1

A method for named entity normalization in biomedical articles: Application to diseases and plants

12 43 1

Improving named entity recognition in Vietnamese texts by a character-level deep lifelong learning model

17 38 3

Báo cáo khoa học: "A Probabilistic Model for Canonicalizing Named Entity Mentions"

9 42 0

Báo cáo khoa học: "Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia"

9 76 0

Báo cáo khoa học: "Named Entity Disambiguation in Streaming Data"

10 41 0

Báo cáo khoa học: "Joint Inference of Named Entity Recognition and Normalization for Tweets"

10 71 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461942 55

Giới thiệu :Lập trình mã nguồn mở

14 23123 64

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10987 531

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10183 451

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9572 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8385 1132

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8278 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7895 2234

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6836 256

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6123 1484

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 328 1 21-05-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 141 0 21-05-2024

QUẢN LÝ CHẤT LƯỢNG KHÔNG KHÍ

75 145 0 21-05-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 152 3 21-05-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 124 1 21-05-2024

GIÁO TRÌNH VI XỬ LÝ 1 - CHƯƠNG 5. LẬP TRÌNH CHO VI ĐIỀU KHIỂN 80C51

23 118 1 21-05-2024

Fecal Incontinence Diagnosis and Treatment - part 8

35 110 0 21-05-2024

Truyện kiếm hiệp - Duy ngã độc tôn phần 5/7

1 103 0 21-05-2024

A Practical Guide for Health Researchers - part 7

24 115 0 21-05-2024

ĐỀ THI THỬ ĐẠI HỌC 2009 – THPT ĐÔNG SƠN 1 – LẦN 2 – MÔN TOÁN

8 108 0 21-05-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7895 2234

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6123 1484

Ebook Chào con ba mẹ đã sẵn sàng

112 3788 1255

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5413 1138

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8385 1132

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3552 656

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3757 544

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10987 531

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4170 523

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4191 483