TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Topic Modelling for Multi-Party Spoken Discourse"

We present a method for unsupervised topic modelling which adapts methods used in document classiﬁcation (Blei et al., 2003; Grifﬁths and Steyvers, 2004) to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identiﬁcation: automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods (Galley et al., 2003) while simultaneously extracting topics which rate highly when assessed for coherence by human judges. . | Unsupervised Topic Modelling for Multi-Party Spoken Discourse Matthew Purver CSLI Stanford University Stanford CA 94305 UsA mpurver@ Thomas L. Griffiths Dept. of Cognitive Linguistic Sciences Brown University Providence RI 02912 USA tomgriffiths@ Abstract We present a method for unsupervised topic modelling which adapts methods used in document classification Blei et al. 2003 Griffiths and Steyvers 2004 to unsegmented multi-party discourse transcripts. We show how Bayesian inference in this generative model can be used to simultaneously address the problems of topic segmentation and topic identification automatically segmenting multi-party meetings into topically coherent segments with performance which compares well with previous unsupervised segmentation-only methods Galley et al. 2003 while simultaneously extracting topics which rate highly when assessed for coherence by human judges. We also show that this method appears robust in the face of off-topic dialogue and speech recognition errors. 1 Introduction Topic segmentation - division of a text or discourse into topically coherent segments - and topic identification - classification of those segments by subject matter - are joint problems. Both are necessary steps in automatic indexing retrieval and summarization from large datasets whether spoken or written. Both have received significant attention in the past see Section 2 but most approaches have been targeted at either text or monologue and most address only one of the two issues usually for the very good reason that the dataset itself provides the other for example by the explicit separation of individual documents or news stories in a collection . Spoken multi-party meetings pose a difficult problem firstly neither the Konrad P. Kording Dept. of Brain Cognitive Sciences Massachusetts Institute of Technology Cambridge MA 02139 USA kording@ Joshua B. Tenenbaum Dept. of Brain Cognitive Sciences Massachusetts Institute of .

Huệ Hương 64 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input"

8 87 0

Báo cáo khoa học: "Unsupervised Topic Identiﬁcation by Integrating Linguistic and Visual Information Based on Hidden Markov Models"

8 75 0

Báo cáo khoa học: "Unsupervised Topic Modelling for Multi-Party Spoken Discourse"

8 49 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26104 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11350 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8507 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7765 1793

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7274 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 392 3 28-12-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 229 3 28-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 164 1 28-12-2024

Bảng màu theo chữ cái – V

11 168 2 28-12-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 168 2 28-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 161 1 28-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 154 4 28-12-2024

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 187 2 28-12-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 151 1 28-12-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 214 1 28-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7765 1793

Ebook Chào con ba mẹ đã sẵn sàng

112 4409 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6305 1268

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3843 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4719 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11350 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490