TAILIEUCHUNG - Automating the Construction of Internet Portals with Machine Learning

Domain-specic internet portals are growing in popularity because they gather content from the Web and organize it for easy access, retrieval and search. For example, allows complex queries by age, location, cost and specialty over summer camps. This functionality is not possible with general, Web-wide search engines. Unfortunately these portals are di cult and time-consuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domain-specic Internet portals. We describe new research in reinforcement learning, information extraction and text classication that enables e cient spidering, the identication of informative text segments, and the population of topic hierarchies. Using these techniques, we have. | Automating the Construction of Internet Portals with Machine Learning Andrew Kachites McCallum mccallum@ Just Research and Carnegie Mellon University Kamal Nigam knigam@ Carnegie Mellon University Jason Rennie jrennie@ Massachusetts Institute of Technology Kristie Seymore kseymore@ Carnegie Mellon University Abstract. Domain-specific internet portals are growing in popularity because they gather content from the Web and organize it for easy access retrieval and search. For example allows complex queries by age location cost and specialty over summer camps. This functionality is not possible with general Web-wide search engines. Unfortunately these portals are difficult and time-consuming to maintain. This paper advocates the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific Internet portals. We describe new research in reinforcement learning information extraction and text classification that enables efficient spidering the identification of informative text segments and the population of topic hierarchies. Using these techniques we have built a demonstration system a portal for computer science research papers. It already contains over 50 000 papers and is publicly available at . These techniques are widely applicable to portal creation in other domains. Keywords spidering crawling reinforcement learning information extraction hidden Markov models text classification naive Bayes Expectation-Maximization unlabeled data 1. Introduction As the amount of information on the World Wide Web grows it becomes increasingly difficult to find just what we want. While generalpurpose search engines such as AltaVista and Google offer quite useful coverage it is often difficult to get high precision even for detailed queries. When we know that we want information of a certain type or on a certain topic a domain-specific Internet portal can be a

Kim Oanh 68 46 pdf

Upload

Không thể tạo bản xem trước, hãy bấm tải xuống

Tải xuống

TÀI LIỆU LIÊN QUAN

Background Paper of the Task Force on Child Health and Maternal Health

79 69 0

Asia and Pacific Regional Economic Outlook––October 2012 Update

5 54 0

Green Marketing, Renewables, and Free Riders: Increasing Customer Demand for a Public Good

53 62 0

Yale 2013 spring summer

84 56 0

PREVENTION AND CONTROL OF POLLUTION

32 59 0

Automating the Construction of Internet Portals with Machine Learning

46 52 0

Children and Youth with Special Health Care Needs

4 67 1

Understanding Emerging Market Bonds Claude B. Erb Liberty Mutual Insurance Company

36 55 0

Project Finance as a Risk- Management Tool in International Syndicated Lending

48 58 0

Home on the Range—Health Literacy, Rural Elderly, Well-Being

8 55 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462284 61

Giới thiệu :Lập trình mã nguồn mở

14 24843 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10508 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9785 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8463 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7465 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7185 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 150 3 23-11-2024

Color Atlas of Ophthamology

165 131 2 23-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 139 1 23-11-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 169 2 23-11-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 138 1 23-11-2024

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 133 1 23-11-2024

The Ombudsman Enterprise and Administrative Justice

309 132 0 23-11-2024

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 141 1 23-11-2024

TRẮC NGHIỆM - CÁC BỆNH THIẾU DINH DƯỠNG THƯỜNG GẶP

32 201 2 23-11-2024

LINUX DEVICE DRIVERS 3rd edition phần 8

64 123 0 23-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7465 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6149 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3786 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4614 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4447 490