TAILIEUCHUNG - Báo cáo khoa học: "Confidence-Weighted Learning of Factored Discriminative Language Models"

Language models based on word surface forms only are unable to benefit from available linguistic knowledge, and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities, and we adopt confidence-weighted learning, a form of discriminative online learning that can better take advantage of a heavy tail of rare features. | Confidence-Weighted Learning of Factored Discriminative Language Models Viet Ha-Thuc Computer Science Department The University of Iowa Iowa City IA 52241 USA hviet@ Nicola Cancedda Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Abstract Language models based on word surface forms only are unable to benefit from available linguistic knowledge and tend to suffer from poor estimates for rare features. We propose an approach to overcome these two limitations. We use factored features that can flexibly capture linguistic regularities and we adopt confidence-weighted learning a form of discriminative online learning that can better take advantage of a heavy tail of rare features. Finally we extend the confidence-weighted learning to deal with label noise in training data a common case with discriminative language modeling. 1 Introduction Language Models LMs are key components in most statistical machine translation systems where they play a crucial role in promoting output fluency. Standard n-gram generative language models have been extended in several ways. Generative factored language models Bilmes and Kirchhoff 2003 represent each token by multiple factors -such as part-of-speech lemma and surface form-and capture linguistic patterns in the target language at the appropriate level of abstraction. Instead of estimating likelihood discriminative language models Roark et al. 2004 Roark et al. 2007 Li and Khudanpur 2008 directly model fluency by casting the task as a binary classification or a ranking problem. The method we propose combines advantages of both directions mentioned above. We use factored features to capture linguistic patterns and discriminative learning for directly modeling fluency. We define highly overlapping and correlated factored features and extend a robust learning algorithm to handle them and cope with a high rate of label noise. For discriminatively learning language models we

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
13    160    1    01-01-2025
6    143    0    01-01-2025
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.