TAILIEUCHUNG - Báo cáo khoa học: "A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections"

User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user’s information need and documents in a specific user generated content environment, the blogosphere, we apply a form of query expansion, ., adding and reweighing query terms. Since the blogosphere is noisy, query expansion on the collection itself is rarely effective but external, edited collections are more suitable. We propose a generative model for expanding queries using external collections in which dependencies between queries, documents, and expansion documents are explicitly modeled. . | A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections Wouter Weerkamp Krisztian Balog ISLA University of Amsterdam Maarten de Rijke mdr@ Abstract User generated content is characterized by short noisy documents with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user s information need and documents in a specific user generated content environment the blogosphere we apply a form of query expansion . adding and reweighing query terms. Since the blogosphere is noisy query expansion on the collection itself is rarely effective but external edited collections are more suitable. We propose a generative model for expanding queries using external collections in which dependencies between queries documents and expansion documents are explicitly modeled. Different instantiations of our model are discussed and make different in dependence assumptions. Results using two external collections news and Wikipedia show that external expansion for retrieval of user generated content is effective besides conditioning the external collection on the query is very beneficial and making candidate expansion terms dependent on just the document seems sufficient. 1 Introduction One of the grand challenges in information retrieval is to bridge the vocabulary gap between a user and her information need on the one hand and the relevant documents on the other Baeza-Yates and Ribeiro-Neto 1999 . In the setting of blogs or other types of user generated content bridging this gap becomes even more challenging. This has several causes i the spelling errors unusual creative or unfocused language usage resulting from the lack of top-down rules and editors in the content creation process and ii the often limited length of user generated documents. Query expansion . modifying the query by adding and reweighing terms is an often used technique to bridge the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.