TAILIEUCHUNG - Báo cáo khoa học: "Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition"

Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary (OOV) ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition approaches to open vocabulary speech recognition, using Estonian speech recognition as a benchmark. Comparisons are performed utilizing large models of 60000 lexical items and smaller vocabularies of 5000 items. . | Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition Antti Puurula and Mikko Kurimo Adaptive Informatics Research Centre Helsinki University of Technology 5400 FIN-02015 HUT Finland puurula mikkok @ Abstract Speech recognition in many morphologically rich languages suffers from a very high out-of-vocabulary OOV ratio. Earlier work has shown that vocabulary decomposition methods can practically solve this problem for a subset of these languages. This paper compares various vocabulary decomposition approaches to open vocabulary speech recognition using Estonian speech recognition as a benchmark. Comparisons are performed utilizing large models of 60000 lexical items and smaller vocabularies of 5000 items. A large vocabulary model based on a manually constructed morphological tagger is shown to give the lowest word error rate while the unsupervised morphology discovery method Morfessor Baseline gives marginally weaker results. Only the Morfessor-based approach is shown to adequately scale to smaller vocabulary sizes. 1 Introduction OOV problem Open vocabulary speech recognition refers to automatic speech recognition ASR of continuous speech or speech-to-text of spoken language where the recognizer is expected to recognize any word spoken in that language. This capability is a recent development in ASR and is required or beneficial in many of the current applications of ASR technology. Moreover large vocabulary speech recogni-89 tion is not possible in most languages of the world without first developing the tools needed for open vocabulary speech recognition. This is due to a fundamental obstacle in current ASR called the out-ofvocabulary OOV problem. The OOV problem refers to the existence of words encountered that a speech recognizer is unable to recognize as they are not covered in the vocabulary. The OOV problem is caused by three intertwined issues. Firstly the language model training data and the test data always come .

Ngọc Tâm 69 7 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition"

7 64 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461996 55

Giới thiệu :Lập trình mã nguồn mở

14 23443 69

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11060 535

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10273 453

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9599 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8503 1145

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8320 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7906 2240

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6917 258

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6455 1562

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 331 1 09-06-2024

Management and Services Part 1

10 176 1 09-06-2024

Posted prices versus bargaining in markets_7

23 171 0 09-06-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 132 1 09-06-2024

Hệ thống làm lạnh và điều hòa không khí

21 139 0 09-06-2024

báo cáo hóa học:" Journal of the International AIDS Society: an important step forward"

2 98 0 09-06-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 135 1 09-06-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 193 5 09-06-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 190 2 09-06-2024

Phương pháp trăc nghiệm 7

6 121 0 09-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7906 2240

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6455 1562

Ebook Chào con ba mẹ đã sẵn sàng

112 3945 1289

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5556 1155

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8503 1145

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3591 662

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3796 571

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11060 535

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4234 527

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4261 483