TAILIEUCHUNG - Báo cáo khoa học: "Speech Recognition of Czech - Inclusion of Rare Words Helps"

Large vocabulary continuous speech recognition of inﬂective languages, such as Czech, Russian or Serbo-Croatian, is heavily deteriorated by excessive out of vocabulary rate. In this paper, we tackle the problem of vocabulary selection, language modeling and pruning for inﬂective languages. We show that by explicit reduction of out of vocabulary rate we can achieve signiﬁcant improvements in recognition accuracy while almost preserving the model size. Reported results are on Czech speech corpora. . | Speech Recognition of Czech - Inclusion of Rare Words Helps PetrPodvesky and Pavel Machek Institute of Formal and Applied Linguistics Charles University Prague Czech Republic podvesky machek @ Abstract Large vocabulary continuous speech recognition of inflective languages such as Czech Russian or Serbo-Croatian is heavily deteriorated by excessive out of vocabulary rate. In this paper we tackle the problem of vocabulary selection language modeling and pruning for inflective languages. We show that by explicit reduction of out of vocabulary rate we can achieve significant improvements in recognition accuracy while almost preserving the model size. Reported results are on Czech speech corpora. 1 Introduction Large vocabulary continuous speech recognition of inflective languages is a challenging task for mainly two reasons. Rich morphology generates huge number of forms which are not captured by limited-size dictionaries and therefore leads to worse recognition results. Relatively free word order admits enormous number of word sequences and thus impoverishes n-gram language models. In this paper we are concerned with the former issue. Previous work which deals with excessive vocabulary growth goes mainly in two lines. Authors have either decided to break words into sub-word units or to adapt dictionaries in a multi-pass scenario. On Czech data Byrne et al. 2001 suggest to use linguistically motivated recognition units. Words are broken down to stems and endings and used as the recognition units in the first recognition phase. In the second phase stems and endings are concatenated. On Serbo-Croatian Geutner et al. 1998 also tested morphemes as the recognition units. Both groups of authors agreed that this approach is not beneficial for speech recognition of inflective languages. Vocabulary adaptation however brought considerable improvement. Both Icring and Psutka 2001 on Czech and Geutner et al. 1998 on Serbo-Croatian reported substantial reduction of

Thanh Vân 85 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Incorporating speech recognition conﬁdence into discriminative named entity recognition of speech data"

8 72 0

Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition - Part 2 (Daniel Jurafsky, James H. Martin)

336 32 1

Automatic error correction for repeated words in mandarin speech recognition

6 90 0

Development of high performance and large scale Vietnamese automatic speech recognition systems

14 80 0

Development of a Vietnamese speech recognition under noisy environments

3 15 1

Báo cáo khoa học: "Using Chunk Based Partial Parsing of Spontaneous Speech in Unrestricted Domains for Reducing Word Error Rate in Speech Recognition"

7 55 0

Lecture Artificial Intelligence - Chapter 15b: Speech recognition (briefly)

14 11 1

Lecture note Artificial Intelligence - Chapter 15b: Speech recognition (briefly)

3 14 1

Neural network based tonal feature for Vietnamese speech recognition using multi space distribution model

8 77 0

Speech recognition in human computer interactive control

5 72 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461846 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6642 253

Vật lý hạt cơ bản (1)

29 5754 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 233 0 19-04-2024

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 258 0 19-04-2024

Trading Strategies Profit Making Techniques For Stock_3

23 181 0 19-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 19-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 19-04-2024

MySQL Database Usage & Administration PHẦN 9

37 137 0 19-04-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 138 0 19-04-2024

Đóng mới oto 8 chỗ ngồi part 9

10 115 0 19-04-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 123 0 19-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5591 1326

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4023 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4098 478