Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Reducing the Annotation Effort for Letter-to-Phoneme Conversion"

Mạnh Hùng 59 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Letter-to-phoneme (L2P) conversion is the process of producing a correct phoneme sequence for a word, given its letters. It is often desirable to reduce the quantity of training data — and hence human annotation — that is needed to train an L2P classiﬁer for a new language. In this paper, we confront the challenge of building an accurate L2P classiﬁer with a minimal amount of training data by combining several diverse techniques: context ordering, letter clustering, active learning, and phonetic L2P alignment. Experiments on six languages show up to 75% reduction in annotation effort. . | Reducing the Annotation Effort for Letter-to-Phoneme Conversion Kenneth Dwyer and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton AB Canada T6G 2E8 dwyer kondrak @cs.ualberta.ca Abstract Letter-to-phoneme L2P conversion is the process of producing a correct phoneme sequence for a word given its letters. It is often desirable to reduce the quantity of training data and hence human annotation that is needed to train an L2P classifier for a new language. In this paper we confront the challenge of building an accurate L2P classifier with a minimal amount of training data by combining several diverse techniques context ordering letter clustering active learning and phonetic L2P alignment. Experiments on six languages show up to 75 reduction in annotation effort. 1 Introduction The task of letter-to-phoneme L2P conversion is to produce a correct sequence of phonemes given the letters that comprise a word. An accurate L2P converter is an important component of a text-to-speech system. In general a lookup table does not suffice for L2P conversion since out-of-vocabulary words e.g. proper names are inevitably encountered. This motivates the need for classification techniques that can predict the phonemes for an unseen word. Numerous studies have contributed to the development of increasingly accurate L2P systems Black et al. 1998 Kienappel and Kneser 2001 Bisani and Ney 2002 Demberg etal. 2007 Jiampojamarn et al. 2008 . A common assumption made in these works is that ample amounts of labelled data are available for training a classifier. Yet in practice this is the case for only a small number of languages. In order to train an L2P classifier for a new language we must first annotate words in that language with their correct phoneme sequences. As annotation is expensive we would like to minimize the amount of effort that is required to build an adequate training set. The objective of this work is not necessarily to achieve state-of-the-art .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Reducing Wrong Labels in Distant Supervision for Relation Extraction"

Báo cáo khoa học: "Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations"

Báo cáo khoa học: "Reducing the Annotation Effort for Letter-to-Phoneme Conversion"

Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity"

Báo cáo khoa học: "Reducing SMT Rule Table with Monolingual Key Phrase"

Báo cáo khoa học: "Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition"

Báo cáo khoa hoc:" Could sound be used as a strategy for reducing symptoms of perceived motion sickness?"

Báo cáo toán học: "nfliximab and Etanercept Are Equally Effective in Reducing Enterocyte APOPTOSIS in Experimental Colitis"

báo cáo khoa học: "The efficacy of preopoerative instruction in reducing anxiety following gyneoncological surgery: a case control study"

báo cáo khoa học: " Developmental origins of health and disease: reducing the burden of chronic disease in the next generation"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.