TAILIEUCHUNG - Báo cáo khoa học: "Correcting errors in speech recognition with articulatory dynamics"

We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic/articulatory data. | Correcting errors in speech recognition with articulatory dynamics Frank Rudzicz University of Toronto Department of Computer Science Toronto Ontario Canada frank@ Abstract We introduce a novel mechanism for incorporating articulatory dynamics into speech recognition with the theory of task dynamics. This system reranks sentencelevel hypotheses by the likelihoods of their hypothetical articulatory realizations which are derived from relationships learned with aligned acoustic articulatory data. Experiments compare this with two baseline systems namely an acoustic hidden Markov model and a dynamic Bayes network augmented with discretized representations of the vocal tract. Our system based on task dynamics reduces worderror rates significantly by relative to the best baseline models. 1 Introduction Although modern automatic speech recognition ASR takes several cues from the biological perception of speech it rarely models its biological production. The result is that speech is treated as a surface acoustic phenomenon with lexical or phonetic hidden dynamics but without any physical constraints in between. This omission leads to some untenable assumptions. For example speech is often treated out of convenience as a sequence of discrete non-overlapping packets such as phonemes despite the fact that some major difficulties in ASR such as co-articulation are by definition the result of concurrent physiological phenomena Hardcastle and Hewlett 1999 . Many acoustic ambiguities can be resolved with knowledge of the vocal tract s configuration O Shaughnessy 2000 . For example the three nasal sonorants m n and ng are acoustically similar . they have large concentrations of energy at the same frequencies but uniquely and reliably involve bilabial closure tongue-tip elevation and tongue-dorsum elevation respectively. Having access to the articulatory goals of the speaker would in theory make the identification of linguistic intent almost trivial. Although

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.