TAILIEUCHUNG - Báo cáo khoa học: "Reducing the Annotation Effort for Letter-to-Phoneme Conversion"

Letter-to-phoneme (L2P) conversion is the process of producing a correct phoneme sequence for a word, given its letters. It is often desirable to reduce the quantity of training data — and hence human annotation — that is needed to train an L2P classiﬁer for a new language. In this paper, we confront the challenge of building an accurate L2P classiﬁer with a minimal amount of training data by combining several diverse techniques: context ordering, letter clustering, active learning, and phonetic L2P alignment. Experiments on six languages show up to 75% reduction in annotation effort. . | Reducing the Annotation Effort for Letter-to-Phoneme Conversion Kenneth Dwyer and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton AB Canada T6G 2E8 dwyer kondrak @ Abstract Letter-to-phoneme L2P conversion is the process of producing a correct phoneme sequence for a word given its letters. It is often desirable to reduce the quantity of training data and hence human annotation that is needed to train an L2P classifier for a new language. In this paper we confront the challenge of building an accurate L2P classifier with a minimal amount of training data by combining several diverse techniques context ordering letter clustering active learning and phonetic L2P alignment. Experiments on six languages show up to 75 reduction in annotation effort. 1 Introduction The task of letter-to-phoneme L2P conversion is to produce a correct sequence of phonemes given the letters that comprise a word. An accurate L2P converter is an important component of a text-to-speech system. In general a lookup table does not suffice for L2P conversion since out-of-vocabulary words . proper names are inevitably encountered. This motivates the need for classification techniques that can predict the phonemes for an unseen word. Numerous studies have contributed to the development of increasingly accurate L2P systems Black et al. 1998 Kienappel and Kneser 2001 Bisani and Ney 2002 Demberg etal. 2007 Jiampojamarn et al. 2008 . A common assumption made in these works is that ample amounts of labelled data are available for training a classifier. Yet in practice this is the case for only a small number of languages. In order to train an L2P classifier for a new language we must first annotate words in that language with their correct phoneme sequences. As annotation is expensive we would like to minimize the amount of effort that is required to build an adequate training set. The objective of this work is not necessarily to achieve state-of-the-art .

Mạnh Hùng 59 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Evaluation of reducing shrimp feed in culture of white leg shrimp (Litopenaeus vannamei) using biofloc technology

9 66 0

Femtocell selection scheme for reducing unnecessary handover and enhancing downlink QoS in cognitive femtocell networks

7 65 0

Effect of different pre-treatment methods on reducing sugar of rice substrate to enhance the ethanol yield

19 64 0

Studies on reducing thrips populations in onion by optimizing nitrogen and potash levels

6 49 0

Isolation and characterization of Nicotine reducing probiotics

8 87 0

Acceptability of selected drudgery reducing tools by farmwomen

14 33 1

Studies on biochemical composition of various tomato (Solanum lycopersicum L.) genotypes

11 44 1

Semantic approximation based operator for reducing code bloat in genetic programming

12 46 1

Influences of shrinkage reducing admixture on the mechanical properties, drying shrinkage, water absorption and porosity of Portland cement mortar

13 21 1

Reducing ambulance response time in emergency medical services: A literature review

12 81 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461844 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6639 253

Vật lý hạt cơ bản (1)

29 5753 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 256 0 19-04-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 245 1 19-04-2024

beginning Ubuntu Linux phần 1

34 211 1 19-04-2024

Trading Strategies Profit Making Techniques For Stock_8

23 171 0 19-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 175 2 19-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 136 0 19-04-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 123 0 19-04-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 141 3 19-04-2024

GIÁO TRÌNH MÁY ĐIỆN KHÍ CỤ ĐIỆN - PHẦN I MÁY ĐIỆN - CHƯƠNG 1

46 129 2 19-04-2024

Data Structures and Algorithms - Chapter 9: Hashing

54 111 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5589 1325

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4022 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4093 478