TAILIEUCHUNG - Báo cáo khoa học: "Big Data versus the Crowd: Looking for Relationships in All the Right Places"

Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. | Big Data versus the Crowd Looking for Relationships in All the Right Places Ce Zhang Feng Niu Christopher Re Jude Shavlik Department of Computer Sciences University of Wisconsin-Madison USA czhang leonn chrisre shavlik @ Abstract Classically training relation extractors relies on high-quality manually annotated training data which can be expensive to obtain. To mitigate this cost NLU researchers have considered two newly available sources of less expensive but potentially lower quality labeled data from distant supervision and crowd sourcing. There is however no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant positive impact on quality F1 score . In contrast human feedback has a positive and statistically significant but lower impact on precision and recall. 1 Introduction Relation extraction is the problem of populating a target relation representing an entity-level relationship or attribute with facts extracted from naturallanguage text. Sample relations include people s titles birth places and marriage relationships. Traditional relation-extraction systems rely on manual annotations or domain-specific rules provided by experts both of which are scarce resources that are not portable across domains. To remedy these problems recent years have seen interest in the distant supervision approach for rela 825 tion extraction Wu and Weld 2007 Mintz et al. 2009 . The input to distant supervision is a set of seed facts for the target relation together with an unlabeled text corpus and the output is a set of noisy annotations that can be used by any machine learning technique .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.