TAILIEUCHUNG - Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection

But Americans agreed with the NRA, that nothing else mattered more than the safety of our schoolchildren. Already, more than 23,000 schools have armed guards and in all 50 states, government officials, local authorities and school districts are considering their own initiatives to protect schools with armed security — because when it comes to keeping our kids safe at school, nothing else matters. That is, until the State of the Union address, two days ago. This was the president’s first State of the Union of his second term. This was the address in which the president. | Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Columbia University pirot@ Luis Gravano Columbia University gravano@ Abstract Many valuable text databases on the web have non-crawlable contents that are hidden behind search interfaces. Metasearchers are helpful tools for searching over many such databases at once through a unified query interface. A critical task for a metasearcher to process a query efficiently and effectively is the selection of the most promising databases for the query a task that typically relies on statistical summaries of the database contents. Unfortunately web-accessible text databases do not generally export content summaries. In this paper we present an algorithm to derive content summaries from uncooperative databases by using focused query probes which adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. Our content summaries are the first to include absolute document frequency estimates for the database words. We also present a novel database selection algorithm that exploits both the extracted content summaries and a hierarchical classification of the databases automatically derived during probing to compensate for potentially incomplete content summaries. Finally we evaluate our techniques thoroughly using a variety of databases including 50 real web-accessible text databases. Our experiments indicate that our new content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies. Also our hierarchical database selection algorithm exhibits significantly higher precision than its flat counterparts. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage the VLDB copyright notice and the title of the publication and its date .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.