TAILIEUCHUNG - Báo cáo khoa học: "Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs"

A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes. a m | Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs Marius Pa ca Google Inc. Mountain View California 94043 mars@ Benjamin Van Durme University of Rochester Rochester New York 14627 vandurme@ Abstract A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale manually-assembled classes. 1 Introduction Current methods for large-scale information extraction take advantage of unstructured text available from either Web documents Banko et al. 2007 Snow et al. 2006 or more recently logs of Web search queries Paặca 2007 to acquire useful knowledge with minimal supervision. Given a manually-specified target attribute . birth years for people and starting from as few as 10 seed facts such as . John Lennon 1941 as many as a million facts of the same type can be derived from unstructured text within Web documents Pasca et al. 2006 . Similarly given a manually-specified target class . Drug with its instances . Vicodin and Xanax and starting from as few as 5 seed attributes . side effects and maximum dose for Drug other relevant attributes can be extracted for the same class from query logs Pa ca 2007 . These and other previous methods require the manual specification of the input classes of instances before any knowledge . facts or attributes can be acquired for those classes. Contributions made during an internship at Google. The extraction method introduced in this paper mines a collection of Web search queries and a collection of Web documents to acquire open-domain classes in the form of instance sets . whales seals dolphins sea lions . associated with class labels . marine animals as well as large sets of open-domain attributes for each class . circulatory .

Lan Khuê 74 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461848 55

Giới thiệu :Lập trình mã nguồn mở

14 22537 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10868 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10031 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9492 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8252 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8208 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6649 253

Vật lý hạt cơ bản (1)

29 5758 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 301 0 20-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 20-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 20-04-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 167 0 20-04-2024

MySQL Database Usage & Administration PHẦN 9

37 138 0 20-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 20-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 136 0 20-04-2024

Hướng dẫn sử dụng Quickoffice cho Ipad và Iphone

13 150 0 20-04-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 123 0 20-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 20-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5617 1333

Ebook Chào con ba mẹ đã sẵn sàng

112 3752 1229

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5259 1127

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8252 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3475 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10868 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3671 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4031 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4109 479