TAILIEUCHUNG - Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization"

In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. . | Modeling Topic Dependencies in Hierarchical Text Categorization Alessandro Moschitti and Qi Ju University of Trento 38123 Povo TN Italy moschitti qi @ Richard Johansson University of Gothenburg SE-405 30 Gothenburg Sweden Abstract In this paper we encode topic dependencies in hierarchical multi-label Text Categorization TC by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally to better investigate the role of category relationships we consider two interesting cases i traditional schemes in which node-fathers include all the documents of their child-categories and ii more general schemes in which children can include documents not belonging to their fathers. The extensive experimentation on Reuters Corpus Volume 1 shows that our rerankers inject effective structural semantic dependencies in multi-classifiers and significantly outperform the state-of-the-art. 1 Introduction Automated Text Categorization TC algorithms for hierarchical taxonomies are typically based on flat schemes . which do not take topic relationships into account. This is due to two major problems i complexity in introducing them in the learning algorithm and ii the small or no advantage that they seem to provide Rifkin and Klautau 2004 . We speculate that the failure of using hierarchical approaches is caused by the inherent complexity of modeling all possible topic dependencies rather than the uselessness of such relationships. More precisely although hierarchical multi-label classifiers can exploit machine learning algorithms for structural output . Tsochantaridis et al. 2005 Rie-zler and Vasserman 2010 Lavergne et al. 2010 759 they often impose a number of simplifying restrictions on some category assignments. Typically the probability of a document d to belong to a subcategory Ci of a category C is assumed to depend

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.