TAILIEUCHUNG - Báo cáo khoa học: "The impact of language models and loss functions on repair disﬂuency detection"

Unrehearsed spoken language often contains disﬂuencies. In order to correctly interpret a spoken utterance, any such disﬂuencies must be identiﬁed and removed or otherwise dealt with. Operating on transcripts of speech which contain disﬂuencies, we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisychannel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disﬂuency detection. | The impact of language models and loss functions on repair disfluency detection Simon Zwarts and Mark Johnson Centre for Language Technology Macquarie University @ Abstract Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance any such disfluen-cies must be identified and removed or otherwise dealt with. Operating on transcripts of speech which contain disfluencies we study the effect of language model and loss function on the performance of a linear reranker that rescores the 25-best output of a noisy-channel model. We show that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data and that optimising f-score rather than log loss improves disfluency detection performance. Our approach uses a log-linear reranker operating on the top n analyses of a noisy channel model. We use large language models introduce new features into this reranker and examine different optimisation strategies. We obtain a disfluency detection f-scores of which improves upon the current state-of-the-art. 1 Introduction Most spontaneous speech contains disfluencies such as partial words filled pauses . uh um huh explicit editing terms . I mean parenthetical asides and repairs. Of these repairs pose particularly difficult problems for parsing and related Natural Language Processing NLP tasks. This paper presents a model of disfluency detection based on the noisy channel framework which703 specifically targets the repair disfluencies. By combining language models and using an appropriate loss function in a log-linear reranker we are able to achieve f-scores which are higher than previously reported. Often in natural language processing algorithms more data is more important than better algorithms Brill and Banko 2001 . It is this insight that drives the first part of the work described in this .

Thường Kiệt 66 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

The impact of group dynamics on the success of language learning

6 34 1

Báo cáo khoa học: "The impact of language models and loss functions on repair disﬂuency detection"

9 43 0

Factors that indicate foreign language teachers’ positive impact on students to maintain their interest

13 55 1

Báo cáo khoa học: "The Impact of Spelling Errors on Patent Search"

10 62 0

The impact of augmented reality on learners’ motivation in a reading classroom and related problems

7 90 0

English language graduation thesis: Active learning-the impact of active learning on student performance and student's attitudes toward active learning in English class

60 94 8

The impact of communicative language learning activities on students’ attitude towards learning English

3 30 1

báo cáo hóa học:" Research Article On the Impact of Children’s Emotional Speech on Acoustic and Language Models"

14 29 0

báo cáo hóa học: " Validation of a French language version of the Early Childhood Oral Health Impact Scale (ECOHIS)"

7 36 0

The Impact of U.S. Firms’ Investments in Human Capital on Stock Prices

42 43 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26651 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10566 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9854 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8518 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7902 1820

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7289 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 396 3 08-01-2025

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 231 4 08-01-2025

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 241 3 08-01-2025

Quy Trình Canh Tác Cây Bông Vải

8 170 3 08-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 162 1 08-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1078 2 08-01-2025

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 193 2 08-01-2025

Valve Selection Handbook - Fourth Edition

337 150 2 08-01-2025

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 159 3 08-01-2025

Bệnh sán lá gan trên gia súc và cách phòng trị

3 170 1 08-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7902 1820

Ebook Chào con ba mẹ đã sẵn sàng

112 4435 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6353 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3859 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4768 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4533 490