TAILIEUCHUNG - Báo cáo khoa học: "Structural Topic Model for Latent Topical Structure Analysis"

Topic models have been successfully applied to many document analysis tasks to discover topics embedded in text. However, existing topic models generally cannot capture the latent topical structures in documents. Since languages are intrinsically cohesive and coherent, modeling and discovering latent topical transition structures within documents would be beneficial for many text analysis tasks. | Structural Topic Model for Latent Topical Structure Analysis Hongning Wang Duo Zhang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Urbana IL 61801 USA wang296 dzhang22 czhai @ Abstract Topic models have been successfully applied to many document analysis tasks to discover topics embedded in text. However existing topic models generally cannot capture the latent topical structures in documents. Since languages are intrinsically cohesive and coherent modeling and discovering latent topical transition structures within documents would be beneficial for many text analysis tasks. In this work we propose a new topic model Structural Topic Model which simultaneously discovers topics and reveals the latent topical structures in text through explicitly modeling topical transitions with a latent first-order Markov chain. Experiment results show that the proposed Structural Topic Model can effectively discover topical structures in text and the identified structures significantly improve the performance of tasks such as sentence annotation and sentence ordering. 1 Introduction A great amount of effort has recently been made in applying statistical topic models Hofmann 1999 Blei et al. 2003 to explore word co-occurrence patterns . topics embedded in documents. Topic models have become important building blocks of many interesting applications see . Blei and Jordan 2003 Blei and Lafferty 2007 Mei et al. 2007 Lu and Zhai 2008 . In general topic models can discover word clustering patterns in documents and project each document to a latent topic space formed by such word clusters. However the topical structure in a document . the internal dependency between the top-1526 ics is generally not captured due to the exchangeability assumption Blei et al. 2003 . the document generation probabilities are invariant to content permutation. In reality natural language text rarely consists of isolated unrelated sentences but

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.