TAILIEUCHUNG - Báo cáo khoa học: "Combining Source and Target Language Information for Name Tagging of Machine Translation Output"

A Named Entity Recognizer (NER) generally has worse performance on machine translated text, because of the poor syntax of the MT output and other errors in the translation. As some tagging distinctions are clearer in the source, and some in the target, we tried to integrate the tag information from both source and target to improve target language tagging performance, especially recall. | Combining Source and Target Language Information for Name Tagging of Machine Translation Output Shasha Liao New York University 715 Broadway 7th floor New York NY 10003 USA liaoss@ Abstract A Named Entity Recognizer NER generally has worse performance on machine translated text because of the poor syntax of the MT output and other errors in the translation. As some tagging distinctions are clearer in the source and some in the target we tried to integrate the tag information from both source and target to improve target language tagging performance especially recall. In our experiments with Chinese-to-English MT output we first used a simple merge of the outputs from an ET Entity Translation system and an English NER system getting an absolute gain of in F-measure from to . We then trained an MEMM module to integrate them more discriminatively and got a further average gain of in F-measure from to . 1 Introduction Because of the growing multilingual environment for NLP there is an increasing need to be able to annotate and analyze the output of machine translation MT systems. But treating this task as one of processing ordinary text can lead to poor results. We examine this problem with respect to the name tagging of English text. A Named Entity Recognizer NER trained on an English corpus does not have the same performance when applied to machine-translated text. From our experiments on NIST 05 Chinese-to-English MT evaluation data when we used the same English NER to tag the reference translation and the MT output the F-measure was for the reference but only for the MT output. There are two primary reasons for this. First the performance of current translation systems is not very good and so the output is quite different from Standard English text. The fluency of the translated text will be poor and the context of a named entity may be weird. Second the translated text has some foreign names which are hard .

Xuân Thảo 60 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Combining Source and Target Language Information for Name Tagging of Machine Translation Output"

6 48 0

Combining ability analysis for seed yield and its component traits with diverse CMS sources in sunflower (Helianthus annuus L.)

7 93 0

Báo cáo sinh học: " Research Article Combining Superdirective Beamforming and Frequency-Domain Blind Source Separation for Highly Reverberant Signals"

13 50 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462307 61

Giới thiệu :Lập trình mã nguồn mở

14 25017 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11301 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10515 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9800 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8879 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8469 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8093 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7501 1765

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7200 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 134 2 02-12-2024

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 179 2 02-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 158 1 02-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 136 1 02-12-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 161 1 02-12-2024

The Ombudsman Enterprise and Administrative Justice

309 135 0 02-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 165 1 02-12-2024

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 169 0 02-12-2024

Báo cáo khoa học: "A rare coexistence of adrenal cavernous hemangioma with extramedullar hemopoietic tissue: a case report and brief review of the literature"

4 103 0 02-12-2024

CÔNG NGHỆ MÔI TRƯỜNG - CHƯƠNG 5 CƠ SỞ QUÁ TRÌNH XỬ LÝ SINH HỌC

1 135 0 02-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8093 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7501 1765

Ebook Chào con ba mẹ đã sẵn sàng

112 4370 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6169 1260

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8879 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3801 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3912 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4629 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11301 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4463 490