TAILIEUCHUNG - Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection"

Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection. | The impact of language models and loss functions on repair disfluency detection Simon Zwarts and Mark Johnson Centre for Language Technology Macquarie University @ Abstract Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance any such disfluen-cies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisy-channel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker operating on the top n analyses of a noisy channel model. We use large language models introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of which improves upon the current state-of-the-art. 1 Introduction Most spontaneous speech contains disfluencies such as partial words filled pauses . uh um huh explicit editing terms . I mean parenthetical asides and repairs. Of these repairs pose particularly difficult problems for parsing and related Natural Language Processing NLP tasks. This paper presents a model of disfluency detection based on the noisy channel framework which703 specifically targets the repair disfluencies. By combining language models and using an appropriate loss function in a log-linear reranker we are able to achieve f-scores which are higher than previously reported. Often in natural language processing algorithms more data is more important than better algorithms Brill and Banko 2001 . It is this insight that drives the first part of the work described in this .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.