TAILIEUCHUNG - Báo cáo khoa học: "Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web"

Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract relations that contain attributes of unknown types. However, this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances, improving the overall RE performance . | Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web Benjamin Rosenfeld Information Systems HU School of Business Hebrew University Jerusalem Israel grurgrur@ Ronen Feldman Information Systems HU School of Business Hebrew University Jerusalem Israel Abstract Many errors produced by unsupervised and semi-supervised relation extraction RE systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components instead relying on general-purpose shallow parsing. Such systems have greater applicability because they are able to extract relations that contain attributes of unknown types. However this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances improving the overall RE performance. We test the methods on SRES - a self-supervised Web relation extraction system. We also compare the performance of corpus-based methods to the performance of validation and correction methods based on supervised NER components. 1 Introduction Information Extraction IE is the task of extracting factual assertions from text. Most IE systems rely on knowledge engineering or on machine learning to generate the task model that is subsequently used for extracting instances of entities and relations from new text. In the knowledge engineering approach the model usually in the form of extraction rules is created manually and in the machine learning approach the model is learned automatically from a manually labeled training set of documents. Both approaches require substantial human effort particularly when applied to the broad range of documents entities and relations on the Web. In order to minimize the manual effort necessary to build Web IE systems semisupervised and completely unsupervised .

Diễm Châu 70 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web"

8 54 0

Báo cáo khoa học: "Using an Annotated Corpus as a Stochastic Grammar"

8 79 0

Báo cáo khoa học: "Solving Relational Similarity Problems Using the Web as a Corpus"

9 52 0

Báo cáo khoa học: "Word Alignment in English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies"

8 58 0

Báo cáo khoa học: "Evaluating Centering-based metrics of coherence for text structuring using a reliably annotated corpus"

8 92 0

Báo cáo khoa học: "Specifying the Parameters of Centering Theory: a Corpus-Based Evaluation using Text from Application-Oriented Domains"

8 72 0

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus"

8 69 0

Báo cáo khoa học: "Identifying Syntactic Role of Antecedent in Korean Relative Clause Using Corpus and Thesaurus Information"

7 41 0

Báo cáo khoa học: "Correcting a PoS-tagged corpus using three complementary methods"

9 65 0

Báo cáo khoa học: "Generalised PP-Attachment Disambiguation using Corpus-based Linguistic Diagnostics"

8 62 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462344 61

Giới thiệu :Lập trình mã nguồn mở

14 26318 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11357 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10554 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9848 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8894 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8511 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7802 1800

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7283 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 394 3 01-01-2025

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 230 4 01-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 160 3 01-01-2025

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 166 1 01-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 149 2 01-01-2025

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 183 3 01-01-2025

Bảng màu theo chữ cái – V

11 171 2 01-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1075 2 01-01-2025

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 180 2 01-01-2025

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 215 1 01-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7802 1800

Ebook Chào con ba mẹ đã sẵn sàng

112 4412 1374

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6332 1274

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8894 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3849 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3925 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4735 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11357 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4513 490