TAILIEUCHUNG - Báo cáo khoa học: "Efficient Tree-Based Topic Modeling"

Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the S PARSE LDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. | Efficient Tree-Based Topic Modeling Yuening Hu Department of Computer Science University of Maryland College Park ynhu@ Jordan Boyd-Graber lSchool and UMIACS University of Maryland College Park jbg@ Abstract Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However its expressive power comes at the cost of more complicated inference. We extend the SparseLDA Yao et al. 2009 inference scheme for latent Dirichlet allocation LDA to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time. 1 Introduction Topic models exemplified by latent Dirichlet allocation LDA Blei et al. 2003 discover latent themes present in text collections. Topics discovered by topic models are multinomial probability distributions over words that evince thematic coherence. Topic models are used in computational biology computer vision music and of course text analysis. One of LDA s virtues is that it is a simple model that assumes a symmetric Dirichlet prior over its word distributions. Recent work argues for structured distributions that constrain clusters Andrzejewski et al. 2009 span languages Jagarlamudi and Daume III 2010 or incorporate human feedback Hu et al. 2011 to improve the quality and flexibility of topic modeling. These models all use different tree-based prior distributions Section 2 . These approaches are appealing because they preserve conjugacy making inference using Gibbs sampling Heinrich 2004 straightforward. While straightforward inference isn t cheap. Particularly for interactive settings Hu et al. 2011 efficient .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.