TAILIEUCHUNG - Báo cáo khoa học: "Bootstrapping Coreference Resolution Using Word Associations"

In this paper, we present an unsupervised framework that bootstraps a complete coreference resolution (CoRe) system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe – ., the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difﬁcult to learn from small labeled corpora. | Bootstrapping Coreference Resolution Using Word Associations Hamidreza Kobdani Hinrich Schutze Michael Schiehlen and Hans Kamp Institute for Natural Language Processing University of Stuttgart kobdani@ Abstract In this paper we present an unsupervised framework that bootstraps a complete coreference resolution CoRe system from word associations mined from a large unlabeled corpus. We show that word associations are useful for CoRe - . the strong association between Obama and President is an indicator of likely coreference. Association information has so far not been used in CoRe because it is sparse and difficult to learn from small labeled corpora. Since unlabeled text is readily available our unsupervised approach addresses the sparseness problem. In a self-training framework we train a decision tree on a corpus that is automatically labeled using word associations. We show that this unsupervised system has better CoRe performance than other learning approaches that do not use manually labeled data. 1 Introduction Coreference resolution CoRe is the process of finding markables noun phrases referring to the same real world entity or concept. Until recently most approaches tried to solve the problem by binary classification where the probability of a pair of markables being coreferent is estimated from labeled data. Alternatively a model that determines whether a markable is coreferent with a preceding cluster can be used. For both pair-based and cluster-based models a well established feature model plays an important role. Typical systems use a rich feature space based on lexical syntactic and semantic knowledge. Most 783 commonly used features are described by Soon et al. 2001 . Most existing systems are supervised systems trained on human-labeled benchmark data sets for English. These systems use linguistic features based on number gender person etc. It is a challenge to adapt these systems to new domains genres and languages because a .

Quang Minh 80 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Bootstrapping Coreference Resolution Using Word Associations"

10 65 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462337 61

Giới thiệu :Lập trình mã nguồn mở

14 25992 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11342 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10547 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9838 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8502 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7730 1790

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7245 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 229 3 26-12-2024

Bảng màu theo chữ cái – V

11 164 2 26-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 158 1 26-12-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 164 1 26-12-2024

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1073 2 26-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 141 1 26-12-2024

IT Audit: EMC’s Journey to the Private Cloud

13 158 1 26-12-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 207 1 26-12-2024

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining

101 140 1 26-12-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 149 1 26-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7730 1790

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6281 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3838 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3919 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4705 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11342 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4505 490