TAILIEUCHUNG - Báo cáo khoa học: " Bilingual Bootstrapping for WSD"

Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1 ) can beneﬁt from the annotation work done in a resource rich language (L2 ) via parameter projection. However, this method assumes the presence of sufﬁcient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. . | Together We Can Bilingual Bootstrapping for WSD Mitesh M. Khapra Salil Joshi Arindam Chatterjee Pushpak Bhattacharyya Department Of Computer Science and Engineering IIT Bombay Powai Mumbai 400076. miteshk salilj arindam pb @ Abstract Recent work on bilingual Word Sense Disambiguation WSD has shown that a resource deprived language L1 can benefit from the annotation work done in a resource rich language L2 via parameter projection. However this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead we focus on the situation where there are two resource deprived languages both having a very small amount of seed annotated data and a large amount of untagged data. We then use bilingual bootstrapping wherein a model trained using the seed annotated data of Li is used to annotate the untagged data of L2 and vice versa using parameter projection. The untagged instances of L1 and L2 which get annotated with high confidence are then added to the seed data of the respective languages and the above process is repeated. Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi L1 and Marathi L2 as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost. 1 Introduction The high cost of collecting sense annotated data for supervised approaches Ng and Lee 1996 Lee et al. 2004 has always remained a matter of concern for some of the resource deprived languages of the world. The problem is even more hard-hitting for multilingual regions . India which has more than 20 constitutionally recognized languages . To circumvent this problem unsupervised and knowledge 561 based approaches Lesk 1986 Walker and Amsler 1986 Agirre and Rigau 1996 McCarthy et al. 2004 Mihalcea 2005 have been proposed as an alternative but they have failed to deliver good accuracies. .

Sơn Hải 60 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: " Bilingual Bootstrapping for WSD"

9 45 0

Báo cáo khoa học: "Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping"

8 50 0

Báo cáo khoa học: " Word Translation Disambiguation Using Bilingual Bootstrapping"

9 60 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462337 61

Giới thiệu :Lập trình mã nguồn mở

14 25992 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11342 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10547 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9838 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8502 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7730 1790

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7245 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 229 3 26-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 162 1 26-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 145 2 26-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 158 1 26-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 154 4 26-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 178 2 26-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 147 1 26-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 162 1 26-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 141 1 26-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 172 1 26-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7730 1790

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6281 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3838 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3919 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4705 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11342 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4505 490