TAILIEUCHUNG - Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge"

In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precisionoriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English language pairs that outperform the current state-of-the-art results by a signiﬁcant margin. In addition, we show. | Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge Ivan Vulic and Marie-Francine Moens Department of Computer Science KU Leuven Celestijnenlaan 200A Leuven Belgium @ Abstract In this paper we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precision-oriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA BiLDA latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion without any prior knowledge about the language pair relying on a symmetrization process and the one-to-one constraint. We report our results for Italian-English and Dutch-English language pairs that outperform the current state-of-the-art results by a significant margin. In addition we show how to use the algorithm for the construction of high-quality initial seed lexicons of translations. 1 Introduction Bilingual lexicons serve as an invaluable resource of knowledge in various natural language processing tasks such as dictionary-based crosslanguage information retrieval Carbonell et al. 1997 Levow et al. 2005 and statistical machine translation SMT Och and Ney 2003 . In order to construct high quality bilingual lexicons for different domains one usually needs to possess parallel corpora or build such lexicons by hand. Compiling such lexicons manually is often an expensive and time-consuming task whereas the methods for mining the lexicons from parallel corpora are not applicable for language pairs and domains where such corpora is unavailable or missing. Therefore the focus of researchers turned to comparable corpora which consist of documents with partially overlapping content usually available in abundance. Thus it is much easier to build a high-volume comparable corpus. A representative example of such a .

Xuân Linh 68 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge"

11 57 0

Assessment of two highly polymorphic barley microsatellite markers for detecting polymorphism in wheat

4 49 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26098 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11349 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8507 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7758 1792

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7273 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 229 3 28-12-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 191 4 28-12-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 168 2 28-12-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 151 1 28-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 148 1 28-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 153 3 28-12-2024

Word Games with English 1

65 142 1 28-12-2024

IT Audit: EMC’s Journey to the Private Cloud

13 158 1 28-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 175 1 28-12-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 141 1 28-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7758 1792

Ebook Chào con ba mẹ đã sẵn sàng

112 4409 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6293 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3843 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4718 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11349 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4510 490