TAILIEUCHUNG - Báo cáo khoa học: "Credibility Improves Topical Blog Post Retrieval"

Topical blog post retrieval is the task of ranking blog posts with respect to their relevance for a given topic. To improve topical blog post retrieval we incorporate textual credibility indicators in the retrieval process. We consider two groups of indicators: post level (determined using information about individual blog posts only) and blog level (determined using information from the underlying blogs). We describe how to estimate these indicators and how to integrate them into a retrieval approach based on language models. . | Credibility Improves Topical Blog Post Retrieval Wouter Weerkamp ISLA University of Amsterdam weerkamp@ Maarten de Rijke ISLA University of Amsterdam mdr@ Abstract Topical blog post retrieval is the task of ranking blog posts with respect to their relevance for a given topic. To improve topical blog post retrieval we incorporate textual credibility indicators in the retrieval process. We consider two groups of indicators post level determined using information about individual blog posts only and blog level determined using information from the underlying blogs . We describe how to estimate these indicators and how to integrate them into a retrieval approach based on language models. Experiments on the TREC Blog track test set show that both groups of credibility indicators significantly improve retrieval effectiveness the best performance is achieved when combining them. 1 Introduction The growing amount of user generated content available online creates new challenges for the information retrieval IR community in terms of search and analysis tasks for this type of content. The introduction of a blog retrieval track at TREC Ounis et al. 2007 has created a platform where we can begin to address these challenges. During the 2006 edition of the track two types of blog post retrieval were considered topical retrieve posts about a topic and opinionated retrieve opinionated posts about a topic . Here we consider the former task. Blogs and blog posts offer unique features that may be exploited for retrieval purposes. . Mishne 2007b incorporates time in a blog post retrieval model to account for the fact that many blog queries and posts are a response to a news event Mishne and de Rijke 2006 . Data quality is an issue with blogs the quality of posts ranges from low to edited news article-like. Some approaches to post retrieval use indirect quality mea sures . elaborate spam filtering Java et al. 2007 or counting inlinks Mishne 2007a . Few

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.