TAILIEUCHUNG - Báo cáo khoa học: "Lexicalized phonotactic word segmentation"

This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. | Lexicalized phonotactic word segmentation Margaret M. Fleck Department of Computer Science University of Illinois Urbana IL 61801 USA mfleck@ Abstract This paper presents a new unsupervised algorithm WordEnds for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding. 1 Introduction Words are essential to most models of language and speech understanding. Word boundaries define the places at which speakers can fluently pause and limit the application of most phonological rules. Words are a key constituent in structural analyses the output of morphological rules and the constituents in syntactic parsing. Most speech recognizers are word-based. And words are entrenched in the writing systems of many languages. Therefore it is generally accepted that children learning their first language must learn how to segment speech into a sequence of words. Similar but more limited learning occurs when adults hear speech containing unfamiliar words. These words must be accurately delimited so that they can be added to the lexicon and nearby familiar words recognized correctly. Current speech recognizers typically misinterpret such speech. This paper will consider algorithms which segment phonetically transcribed speech into words. For example Figure 1 shows a transcribed phrase from the Buckeye corpus Pitt et al. 2005 Pitt et al. 2007 and the automatically segmented output. Like almost all .

Nhã Hồng 61 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Lexicalized phonotactic word segmentation"

9 46 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462051 59

Giới thiệu :Lập trình mã nguồn mở

14 23742 74

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11116 535

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10355 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9635 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8630 1148

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8356 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6976 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6694 1606

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 214 2 26-06-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 187 2 26-06-2024

Đóng mới oto 8 chỗ ngồi part 9

10 144 1 26-06-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 167 5 26-06-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 146 2 26-06-2024

Truyện kiếm hiệp - Duy ngã độc tôn phần 5/7

1 116 0 26-06-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 126 0 26-06-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 142 1 26-06-2024

Tự học thổi sáo và ngâm thơ part 4

11 173 1 26-06-2024

Bảng màu theo chữ cái – V

11 118 1 26-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6694 1606

Ebook Chào con ba mẹ đã sẵn sàng

112 3996 1299

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5685 1193

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8630 1148

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3633 665

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3845 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4378 543

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11116 535

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4291 483