TAILIEUCHUNG - Báo cáo khoa học: "N-Best Rescoring Based on Pitch-accent Patterns"

In this paper, we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition (ASR) performance. The pitch-accent model is decoupled from the main ASR system, thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm, we use two different data sets and recognition setups: the first one is English radio news data that has pitch accent labels, but the recognizer is trained from a small amount of. | N-Best Rescoring Based on Pitch-accent Patterns Je Hun Jeon1 Wen Wang2 Yang Liu1 department of Computer Science The University of Texas at Dallas USA 2Speech Technology and Research Laboratory SRI International USA jhjeon yangl @ wwang@ Abstract In this paper we adopt an n-best rescoring scheme using pitch-accent patterns to improve automatic speech recognition ASR performance. The pitch-accent model is decoupled from the main ASR system thus allowing us to develop it independently. N-best hypotheses from recognizers are rescored by additional scores that measure the correlation of the pitch-accent patterns between the acoustic signal and lexical cues. To test the robustness of our algorithm we use two different data sets and recognition setups the first one is English radio news data that has pitch accent labels but the recognizer is trained from a small amount of data and has high error rate the second one is English broadcast news data using a state-of-the-art SRI recognizer. Our experimental results demonstrate that our approach is able to reduce word error rate relatively by about 3 . This gain is consistent across the two different tests showing promising future directions of incorporating prosodic information to improve speech recognition. 1 Introduction Prosody refers to the suprasegmental features of natural speech such as rhythm and intonation since it normally extends over more than one phoneme segment. Speakers use prosody to convey paralin-guistic information such as emphasis intention attitude and emotion. Humans listening to speech with natural prosody are able to understand the content with low cognitive load and high accuracy. However most modern ASR systems only use an acous 732 tic model and a language model. Acoustic information in ASR is represented by spectral features that are usually extracted over a window length of a few tens of milliseconds. They miss useful information contained in the prosody of the speech

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.