TAILIEUCHUNG - Classiﬁcation-Aware Hidden-Web Text Database Selection

Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Inclusive financial systems—allowing broad access to financial services, with- out price or nonprice barriers to their use—are especially likely to benefit poor people and other disadvantaged groups. Without inclusive financial systems, poor people must rely on their own limited savings to invest in their education or become entrepreneurs—and small enterprises must rely on their limited earn- ings to pursue promising growth opportunities. This can contribute to persistent income inequality and slower economic growth. . | 6 Classification-Aware Hidden-Web Text Database Selection PANAGIOTIS G. IPEIROTIS New York University and LUIS GRAVANO Columbia University Many valuable text databases on the web have noncrawlable contents that are hidden behind search interfaces. Metasearchers are helpful tools for searching over multiple such hidden-web text databases at once through a unified query interface. An important step in the metasearching process is database selection or determining which databases are the most relevant for a given user query. The state-of-the-art database selection techniques rely on statistical summaries of the database contents generally including the database vocabulary and associated word frequencies. Unfortunately hidden-web text databases typically do not export such summaries so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying. We present a novel focused-probing sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are representative of the topic coverage of the database. Our algorithm is the first to construct content summaries that include the frequencies of the words in the database. Unfortunately Zipf s law practically guarantees that for any relatively large database content summaries built from moderately sized document samples will fail to cover many low-frequency words in turn incomplete content summaries might negatively affect the database selection process especially for short queries with infrequent words. To enhance the sparse document samples and improve the database selection decisions we exploit the fact that topically similar databases tend to have similar vocabularies so samples extracted from databases with a similar topical focus can complement each other. We have developed two database selection algorithms that exploit this observation. The first algorithm proceeds hierarchically and .

Ðạt Hòa 49 66 pdf

Upload

Không thể tạo bản xem trước, hãy bấm tải xuống

Tải xuống

TÀI LIỆU LIÊN QUAN

M8: Push Notifications

21 74 0

Sentinel Event Data Event Type by Year 1995-2012

29 64 1

Mindjet ® MindManager®: A Vital Solution for Improved Project Management

1 68 0

File-Sharing and Copyright: Felix Oberholzer-Gee Koleman Strumpf

46 78 0

MARKETING SUSTAINABLE TOURISM PRODUCTS

42 69 1

2012 Maternal and Child Health Assessment

46 62 0

GUIDELINES FOR IMPLEMENTERS OF Using Environmental Pollution Data in Trafﬁ c Management Centres

12 59 0

Economic Benefits of Air Pollution Regulation in the USA: An Integrated Approach

29 58 0

Submitted to the First 5 LA Ad Hoc Committee and Board of Commissioners

50 66 0

Fractionalization and the municipal bond market

44 63 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462068 59

Giới thiệu :Lập trình mã nguồn mở

14 23836 74

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11128 536

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10372 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9653 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8649 1148

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8360 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7946 2249

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6982 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6762 1610

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 354 1 30-06-2024

Bơm máy nén quạt trong công nghiệp part 8

20 232 3 30-06-2024

HƯỚNG DẪN SỬ DỤNG PHẦN MỀM CAITA part 9

18 160 0 30-06-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 148 2 30-06-2024

Truyện kiếm hiệp - Duy ngã độc tôn phần 5/7

1 119 0 30-06-2024

MẪU GIẤY PHÉP VẬN TẢI LOẠI C

2 139 0 30-06-2024

Báo cáo khoa học: " Principaux critères économiques de gestion des forêts : analyse critique et comparative"

29 108 0 30-06-2024

ĐỀ THI THỬ ĐH NĂM 2011 MÔN VẬT LÍ _ ĐỀ SỐ 101

7 121 0 30-06-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 132 0 30-06-2024

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 155 0 30-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7946 2249

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6762 1610

Ebook Chào con ba mẹ đã sẵn sàng

112 4025 1301

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5703 1194

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8649 1148

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3646 666

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3848 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4411 546

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11128 536

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4296 483