TAILIEUCHUNG - Báo cáo khoa học: "Domain Adaptation of Maximum Entropy Language Models"

We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases. | Domain Adaptation of Maximum Entropy Language Models Tanel Alumae Adaptive Informatics Research Centre School of Science and Technology Aalto University Helsinki Finland tanel@ Abstract We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases. 1 Introduction In large vocabulary speech recognition a language model LM is typically estimated from large amounts of written text data. However recognition is typically applied to speech that is stylistically different from written language. For example in an often-tried setting speech recognition is applied to broadcast news that includes introductory segments conversations and spontaneous interviews. To decrease the mismatch between training and test data often a small amount of speech data is human-transcribed. A LM is then built by interpolating the models estimated from large corpus of written language and the small corpus of transcribed data. However in practice different models might be of different importance depending on the word context. Global interpolation doesn t take such variability into account and all predictions are weighted across models identically regardless of the context. In this paper we investigate a recently proposed Bayesian adaptation approach Daume III 2007 Finkel and Manning 2009 for adapting a conditional maximum entropy ME LM Rosenfeld 1996 to a new domain given a large corpus of out-of-domain training data and a small corpus of in-domain data. The main contribution of this Currently with Tallinn University of Technology Estonia Mikko Kurimo Adaptive Informatics Research Centre School of Science and Technology Aalto University Helsinki Finland paper is that we show how the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.