TAILIEUCHUNG - Báo cáo khoa học: "Distributional Similarity vs. PU Learning for Entity Set Expansion"

Distributional similarity is a classic technique for entity set expansion, where the system is given a set of seed entities of a particular class, and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning (PU learning) can model the set expansion problem better. Based on the test results of 10 corpora, we show that a PU learning technique outperformed distributional similarity significantly. . | Distributional Similarity vs. PU Learning for Entity Set Expansion Xiao-Li Li Institute for Infocomm Research 1 Fusionopolis Way 21-01 Connexis Singapore 138632 xlli@ Bing Liu University of Illinois at Chicago 851 South Morgan Street Chicago Chicago IL 60607-7053 UsA liub@ Abstract Distributional similarity is a classic technique for entity set expansion where the system is given a set of seed entities of a particular class and is asked to expand the set using a corpus to obtain more entities of the same class as represented by the seeds. This paper shows that a machine learning model called positive and unlabeled learning PU learning can model the set expansion problem better. Based on the test results of 10 corpora we show that a PU learning technique outperformed distributional similarity significantly. 1 Introduction The entity set expansion problem is defined as follows Given a set s of seed entities of a particular class and a set D of candidate entities . extracted from a text corpus we wish to determine which of the entities in D belong to s. In other words we expand the set s based on the given seeds. This is clearly a classification problem which requires arriving at a binary decision for each entity in D belonging to s or not . However in practice the problem is often solved as a ranking problem . ranking the entities in D based on their likelihoods of belonging to s. The classic method for solving this problem is based on distributional similarity Pantel et al. 2009 Lee 1998 . The approach works by comparing the similarity of the surrounding word distributions of each candidate entity with the seed entities and then ranking the candidate entities using their similarity scores. Lei Zhang University of Illinois at Chicago 851 South Morgan Street Chicago Chicago IL 60607-7053 UsA zhang3@ See-Kiong Ng Institute for Infocomm Research 1 Fusionopolis Way 21-01 Connexis Singapore 138632 skng@ In .

Ánh Trang 75 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Measures of Distributional Similarity"

8 74 1

Báo cáo khoa học: "Verb Classiﬁcation using Distributional Similarity in Syntactic and Semantic Structures"

10 60 0

Báo cáo khoa học: "Distributional Similarity vs. PU Learning for Entity Set Expansion"

6 64 0

Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity"

9 46 0

Báo cáo khoa học: "Scaling Distributional Similarity to Large Corpora"

8 47 0

Báo cáo khoa học: "Exploring Distributional Similarity Based Models for Query Spelling Correction"

8 58 0

Báo cáo khoa học: "Directional Distributional Similarity for Lexical Expansion"

4 45 0

Báo cáo khoa học: "Integrating Pattern-based and Distributional Similarity Methods for Lexical Entailment Acquisition"

8 48 0

Báo cáo khoa học: "Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks"

4 63 0

Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity"

8 48 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462348 61

Giới thiệu :Lập trình mã nguồn mở

14 26497 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11370 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10557 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9850 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8512 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7844 1803

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7285 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Data Structures and Algorithms - Chapter 8: Heaps

41 192 5 05-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 161 3 05-01-2025

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 184 3 05-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 240 7 05-01-2025

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 175 2 05-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 162 1 05-01-2025

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 159 4 05-01-2025

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 164 1 05-01-2025

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 217 1 05-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 161 1 05-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7844 1803

Ebook Chào con ba mẹ đã sẵn sàng

112 4424 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6336 1275

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3855 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3926 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4754 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11370 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4529 490