TAILIEUCHUNG - Báo cáo khoa học: "Data-Oriented Methods for Grapheme-to-Phoneme Conversion"

It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques, based on a corpus of transcribed words, the same and even better performance can be achieved, without explicit modeling of linguistic knowledge. In this paper we present two instances of this approach. | Data-Oriented Methods for Grapheme-to-Phoneme Conversion Antal van den Bosch and Walter Daelemans ITK Institute for Language Technology and Al Tilburg University . Box 90153 NL-5000 LE Tilburg Tel 31 13 663070 Email antalb@ walter@ Abstract It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. We show that using supervised learning techniques based on a corpus of transcribed words the same and even better performance can be achieved without explicit modeling of linguistic knowledge. In this paper we present two instances of this approach. A first model implements a variant of instance-based learning in which a weighed similarity metric and a database of prototypical exemplars are used to predict new mappings. In the second model grapheme-to-phoneme mappings are looked up in a compressed text-to-speech lexicon table lookup enriched with default mappings. We compare performance and accuracy of these approaches to a connectionist backpropagation approach and to the linguistic knowledge-based approach. 1 Introduction Grapheme-to-phoneme conversion is a central task in any text-to-speech reading aloud system. Given an alphabet of spelling symbols graphemes and an alphabet of phonetic symbols a mapping should be achieved transliterating strings of graphemes into strings of phonetic symbols. It is well known that this mapping is difficult because in general not all graphemes are realised in the phonetic transcription and the same grapheme may correspond to different phonetic symbols depending on context. It is traditionally assumed that various sources of linguistic knowledge and their interaction should be formalised in order to be able to convert words into their phonemic representations with reasonable accuracy. Although different researchers propose different knowledge structures consensus seems

TỪ KHÓA LIÊN QUAN
TÀI LIỆU HOT