TAILIEUCHUNG - Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora"

The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data. | ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora Mãrcis Pinnis1 Radu Ion2 Dan Steíănescii2 Fangzhong Su3 Inguna Skadina1 Andrejs Vasiljevs1 Bogdan Babych3 1Tilde Vienĩbas gatve 75a Riga Latvia andrejs @ Research Institute for Artificial Intelligence Romanian Academy radu danstef @ 3Centre for Translation Studies University of Leeds @ Abstract The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora non-parallel bi- or multi-lingual text resources which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows 1 alignment of comparable documents and extraction of parallel sentences and 2 extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English Latvian Lithuanian and Romanian languages. Introduction In recent decades data-driven approaches have significantly advanced the development of machine translation MT . However lack of sufficient bilingual linguistic resources for many languages and domains is still one of the major obstacles for further advancement of automated translation. At the same time comparable corpora . non-parallel bi- or multilingual text resources such as daily news articles and large knowledge 91 bases like Wikipedia are much more widely available than parallel translation data. While methods for the use of parallel corpora in machine translation are well studied Koehn 2010 similar techniques for comparable corpora have

Bích San 91 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Phát triển ứng dụng J2ME với Samsung JaUmi Wireless Toolkit 2.0

4 42 0

Bài 4 Abstract Window Toolkit

36 40 0

Báo cáo khoa học: "An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation"

6 48 0

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora"

6 74 0

Báo cáo khoa học: "An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation"

6 44 0

Báo cáo khoa học: "Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation"

4 55 0

Báo cáo khoa học: "A Modular Toolkit for Coreference Resolution"

4 61 0

Báo cáo khoa học: "Moses: Open Source Toolkit for Statistical Machine Translation"

4 50 0

Báo cáo khoa học: "The Natural Language Toolkit"

4 42 0

Age appropriate transition assessment toolkit

39 86 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462341 61

Giới thiệu :Lập trình mã nguồn mở

14 26053 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11346 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10551 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9842 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8505 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7748 1790

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7264 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 146 2 27-12-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 165 1 27-12-2024

Valve Selection Handbook - Fourth Edition

337 146 2 27-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 150 3 27-12-2024

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 146 1 27-12-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 168 1 27-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 174 1 27-12-2024

Xinh xinh vườn nhà

6 131 0 27-12-2024

Phạm trù Chủ nghĩa cá nhân của tư tưởng phương Tây trong sự lý giải của Phan Khôi _1

9 131 0 27-12-2024

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 175 0 27-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7748 1790

Ebook Chào con ba mẹ đã sẵn sàng

112 4407 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6284 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3840 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4709 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11346 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4509 490