TAILIEUCHUNG - Báo cáo khoa học: "Conﬁdence-Weighted Learning of Factored Discriminative Language Models"

Language models based on word surface forms only are unable to beneﬁt from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can ﬂexibly capture linguistic regularities, and we adopt conﬁdence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. | Confidence-Weighted Learning of Factored Discriminative Language Models Viet Ha-Thuc Computer Science Department The University of Iowa Iowa City IA 52241 USA hviet@ Nicola Cancedda Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Abstract Language models based on word surface forms only are unable to benefit from available linguistic knowledge and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities and we adopt confidence-weighted learning a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally we extend the confidence-weighted learning to deal with label noise in training data a common case with discriminative language modeling. 1 Introduction Language Models LMs are key components in most statistical machine translation systems where they play a crucial role in promoting output fluency. Standard n-gram generative language models have been extended in several ways. Generative factored language models Bilmes and Kirchhoff 2003 represent each token by multiple factors -such as part-of-speech lemma and surface form-and capture linguistic patterns in the target language at the appropriate level of abstraction. Instead of estimating likelihood discriminative language models Roark et al. 2004 Roark et al. 2007 Li and Khudanpur 2008 directly model fluency by casting the task as a binary classification or a ranking problem. The method we propose combines advantages of both directions mentioned above. We use factored features to capture linguistic patterns and discriminative learning for directly modeling fluency. We define highly overlapping and correlated factored features and extend a robust learning algorithm to handle them and cope with a high rate of label noise. For discriminatively learning language models we

Lan Phương 77 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462344 61

Giới thiệu :Lập trình mã nguồn mở

14 26318 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11357 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10554 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9848 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8894 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8511 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7802 1800

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7283 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 394 3 01-01-2025

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 166 1 01-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 149 2 01-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 160 1 01-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1075 2 01-01-2025

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 155 3 01-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 160 1 01-01-2025

Determini prounoun 1

6 143 0 01-01-2025

Báo cáo khoa học: "A rare coexistence of adrenal cavernous hemangioma with extramedullar hemopoietic tissue: a case report and brief review of the literature"

4 108 0 01-01-2025

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 136 0 01-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7802 1800

Ebook Chào con ba mẹ đã sẵn sàng

112 4412 1374

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6332 1274

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8894 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3849 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3925 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4735 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11357 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4513 490