TAILIEUCHUNG - Báo cáo khoa học: "Big Data versus the Crowd: Looking for Relationships in All the Right Places"

Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. | Big Data versus the Crowd Looking for Relationships in All the Right Places Ce Zhang Feng Niu Christopher Re Jude Shavlik Department of Computer Sciences University of Wisconsin-Madison USA czhang leonn chrisre shavlik @ Abstract Classically training relation extractors relies on high-quality manually annotated training data which can be expensive to obtain. To mitigate this cost NLU researchers have considered two newly available sources of less expensive but potentially lower quality labeled data from distant supervision and crowd sourcing. There is however no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant positive impact on quality F1 score . In contrast human feedback has a positive and statistically significant but lower impact on precision and recall. 1 Introduction Relation extraction is the problem of populating a target relation representing an entity-level relationship or attribute with facts extracted from naturallanguage text. Sample relations include people s titles birth places and marriage relationships. Traditional relation-extraction systems rely on manual annotations or domain-specific rules provided by experts both of which are scarce resources that are not portable across domains. To remedy these problems recent years have seen interest in the distant supervision approach for rela 825 tion extraction Wu and Weld 2007 Mintz et al. 2009 . The input to distant supervision is a set of seed facts for the target relation together with an unlabeled text corpus and the output is a set of noisy annotations that can be used by any machine learning technique .

Công Lập 42 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

34 78 0

Mười cân nhắc dành cho giải pháp Big Data trên đám mây

15 105 0

Big data: Concepts, approaches and challenges

7 55 1

Traceability and ownership claim of data on big data marketplace using blockchain technology

28 39 3

A blockchain based access control for big data

11 85 0

Framework for modelling mobile network quality of experience through big data analytics approach

36 68 0

Navigating the Benford Labyrinth: A big-data analytic protocol illustrated using the academic library context

21 78 0

Công nghệ Big Data và xu hướng ứng dụng

3 121 3

Ứng dụng Big data trong thống kê đánh giá

13 141 2

The big data usability’s trends in education: a systematic literature review

8 81 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461868 55

Giới thiệu :Lập trình mã nguồn mở

14 22645 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10893 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10067 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9522 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8283 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8240 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6687 253

Vật lý hạt cơ bản (1)

29 5771 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 258 0 27-04-2024

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 260 0 27-04-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 167 0 27-04-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 146 0 27-04-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 139 0 27-04-2024

Khurana et al. Journal of Orthopaedic Surgery and Research 2010, 5:23

7 134 0 27-04-2024

Data Structures and Algorithms - Chapter 9: Hashing

54 113 0 27-04-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 120 0 27-04-2024

Christmas Meditations on the Twelve Holy Days

173 104 0 27-04-2024

Lãi suất cơ bản, công cụ quan trọng của chính sách tiền tệ

5 113 0 27-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5738 1368

Ebook Chào con ba mẹ đã sẵn sàng

112 3767 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5321 1136

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8283 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3500 643

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10893 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3685 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4052 516

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4129 480