TAILIEUCHUNG - Báo cáo khoa học: "Interactive Topic Modeling"

Topic models have been used extensively as a tool for corpus exploration, and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data. However, creating such extensions requires expertise in machine learning unavailable to potential end-users of topic modeling software. In this work, we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation (LDA) by adding constraints that enforce that sets of words must appear together in the same topic. . | Interactive Topic Modeling Yuening Hu Department of Computer Science University of Maryland ynhu@ Jordan Boyd-Graber iSchool University of Maryland jbg@ Brianna Satinoff Department of Computer Science University of Maryland bsonrisa@ Abstract Topic models have been used extensively as a tool for corpus exploration and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data. However creating such extensions requires expertise in machine learning unavailable to potential end-users of topic modeling software. In this work we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation LDA by adding constraints that enforce that sets of words must appear together in the same topic. We incorporate these constraints interactively by selectively removing elements in the state of a Markov Chain used for inference we investigate a variety of methods for incorporating this information and demonstrate that these interactively added constraints improve topic usefulness for simulated and actual user sessions. 1 Introduction Probabilistic topic models as exemplified by probabilistic latent semantic indexing Hofmann 1999 and latent Dirichlet allocation LDA Blei et al. 2003 are unsupervised statistical techniques to discover the thematic topics that permeate a large corpus of text documents. Topic models have had considerable application beyond natural language processing in computer vision Rob et al. 2005 biology Shringarpure and Xing 2008 and psychology Landauer et al. 2006 in addition to their canonical application to text. For text one of the few real-world applications of topic models is corpus exploration. Unannotated noisy and ever-growing corpora are the norm rather than the exception and topic models offer a way to quickly get the gist a large 1For examples see Rexa http JSTOR 248 Contrary to the

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.