TAILIEUCHUNG - Báo cáo khoa học: "Reducing the Annotation Effort for Letter-to-Phoneme Conversion"

Letter-to-phoneme (L2P) conversion is the process of producing a correct phoneme sequence for a word, given its letters. It is often desirable to reduce the quantity of training data — and hence human annotation — that is needed to train an L2P classifier for a new language. In this paper, we confront the challenge of building an accurate L2P classifier with a minimal amount of training data by combining several diverse techniques: context ordering, letter clustering, active learning, and phonetic L2P alignment. Experiments on six languages show up to 75% reduction in annotation effort. . | Reducing the Annotation Effort for Letter-to-Phoneme Conversion Kenneth Dwyer and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton AB Canada T6G 2E8 dwyer kondrak @ Abstract Letter-to-phoneme L2P conversion is the process of producing a correct phoneme sequence for a word given its letters. It is often desirable to reduce the quantity of training data and hence human annotation that is needed to train an L2P classifier for a new language. In this paper we confront the challenge of building an accurate L2P classifier with a minimal amount of training data by combining several diverse techniques context ordering letter clustering active learning and phonetic L2P alignment. Experiments on six languages show up to 75 reduction in annotation effort. 1 Introduction The task of letter-to-phoneme L2P conversion is to produce a correct sequence of phonemes given the letters that comprise a word. An accurate L2P converter is an important component of a text-to-speech system. In general a lookup table does not suffice for L2P conversion since out-of-vocabulary words . proper names are inevitably encountered. This motivates the need for classification techniques that can predict the phonemes for an unseen word. Numerous studies have contributed to the development of increasingly accurate L2P systems Black et al. 1998 Kienappel and Kneser 2001 Bisani and Ney 2002 Demberg etal. 2007 Jiampojamarn et al. 2008 . A common assumption made in these works is that ample amounts of labelled data are available for training a classifier. Yet in practice this is the case for only a small number of languages. In order to train an L2P classifier for a new language we must first annotate words in that language with their correct phoneme sequences. As annotation is expensive we would like to minimize the amount of effort that is required to build an adequate training set. The objective of this work is not necessarily to achieve state-of-the-art .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.