TAILIEUCHUNG - Báo cáo khoa học: "Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis"

This paper describes an architecture to convert Sinhala Unicode text into phonemic specification of pronunciation. The study was mainly focused on disambiguating schwa-/\/ and /a/ vowel epenthesis for consonants, which is one of the significant problems found in Sinhala. This problem has been addressed by formulating a set of rules. The proposed set of rules was tested using 30,000 distinct words obtained from a corpus and compared with the same words manually transcribed to phonemes by an expert. The Grapheme-to-Phoneme (G2P) conversion model achieves 98 % accuracy. . | Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis Asanka Wasala Ruvan Weerasinghe and Kumudu Gamage Language Technology Research Laboratory University of Colombo School of Computing 35 Reid Avenue Colombo 07 Sri Lanka awasala kgamage @ arw@ Abstract This paper describes an architecture to convert Sinhala Unicode text into phonemic specification of pronunciation. The study was mainly focused on disambiguating schwa- and a vowel epenthesis for consonants which is one of the significant problems found in Sinhala. This problem has been addressed by formulating a set of rules. The proposed set of rules was tested using 30 000 distinct words obtained from a corpus and compared with the same words manually transcribed to phonemes by an expert. The Grapheme-to-Phoneme G2P conversion model achieves 98 accuracy. 1 Introduction The conversion of Text-to-Speech TTS involves many important processes. These processes can be divided mainly in to three parts text analysis linguistic analysis and waveform generation Black and Lenzo 2003 . The text analysis process is responsible for converting the nontextual content into text. This process also involves tokenization and normalization of the text. The identification of words or chunks of text is called text-tokenization. Text normalization establishes the correct interpretation of the input text by expanding the abbreviations and acronyms. This is done by replacing the non-alphabetic characters numbers and punctuation with appropriate text strings depending on the context. The linguistic analysis process involves finding the correct pronunciation of words and assigning prosodic features eg. phrasing intonation stress to the phonemic string to be spoken. The final process of a TTS system is waveform generation which involves the production of an acoustic digital signal using a particular synthesis approach such as formant synthesis articulatory synthesis or waveform concatenation .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.