TAILIEUCHUNG - Báo cáo khoa học: "Statistical Modeling for Unit Selection in Speech Synthesis"

Traditional concatenative speech synthesis systems use a number of heuristics to define the target and concatenation costs, essential for the design of the unit selection component. In contrast to these approaches, we introduce a general statistical modeling framework for unit selection inspired by automatic speech recognition. Given appropriate data, techniques based on that framework can result in a more accurate unit selection, thereby improving the general quality of a speech synthesizer. They can also lead to a more modular and a substantially more efficient system. . | Statistical Modeling for Unit Selection in Speech Synthesis Cyril Allauzen and Mehryar Mohri and Michael Riley AT T Labs - Research 180 Park Avenue Florham Park NJ 07932 USA allauzen mohri riley @ Abstract Traditional concatenative speech synthesis systems use a number of heuristics to define the target and concatenation costs essential for the design of the unit selection component. In contrast to these approaches we introduce a general statistical modeling framework for unit selection inspired by automatic speech recognition. Given appropriate data techniques based on that framework can result in a more accurate unit selection thereby improving the general quality of a speech synthesizer. They can also lead to a more modular and a substantially more efficient system. We present a new unit selection system based on statistical modeling. To overcome the original absence of data we use an existing high-quality unit selection system to generate a corpus of unit sequences. We show that the concatenation cost can be accurately estimated from this corpus using a statistical n-gram language model over units. We used weighted automata and transducers for the representation of the components of the system and designed a new and more efficient composition algorithm making use of string potentials for their combination. The resulting statistical unit selection is shown to be about times faster than the last release of the AT T Natural Voices Product while preserving the same quality and offers much flexibility for the use and integration of new and more complex components. 1 Motivation A concatenative speech synthesis system Hunt and Black 1996 Beutnagel et al. 1999a consists of three components. The first component the textanalysis frontend takes text as input and outputs a sequence of feature vectors that characterize the acoustic signal to synthesize. The first element of each of these vectors is the predicted phone or halfphone other elements are .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.