TAILIEUCHUNG - Báo cáo khoa học: "Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron"

This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. The ﬁrst approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, signiﬁcant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efﬁcient to train, at some cost in computation on test examples. isting statistical parser, giving signiﬁcant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms have been applied to natural language generation,. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 489-496. Ranking Algorithms for Named-Entity Extraction Boosting and the Voted Perceptron Michael Collins AT T Labs-Research Florham Park New Jersey. mcollins@ Abstract This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train at some cost in computation on test examples. 1 Introduction Recent work in statistical approaches to parsing and tagging has begun to consider methods which incorporate global features of candidate structures. Examples of such techniques are Markov Random Fields Abney 1997 Della Pietra et al. 1997 Johnson et al. 1999 and boosting algorithms Freund et al. 1998 Collins 2000 Walker et al. 2001 . One appeal of these methods is their flexibility in incorporating features into a model essentially any features which might be useful in discriminating good from bad structures can be included. A second appeal of these methods is that their training criterion is often discriminative attempting to explicitly push the score or probability of the correct structure for each training sentence above the score of competing structures. This discriminative property is shared by the methods of Johnson et al. 1999 Collins 2000 and also the Conditional Random Field methods of Lafferty et al. 2001 . In a previous paper Collins 2000 a boosting algorithm was used to rerank the output from an ex isting statistical parser giving significant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms

Thiếu Anh 52 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization"

4 68 0

Báo cáo khoa học: " New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron"

8 66 0

Báo cáo khoa học: "Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron"

8 45 0

Toward optimal feature selection using ranking methods and classification algorithms

17 82 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461942 55

Giới thiệu :Lập trình mã nguồn mở

14 23112 64

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10987 531

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10183 451

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9572 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8385 1132

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8278 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7889 2228

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6836 256

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6115 1473

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

MySQL Database Usage & Administration PHẦN 7

37 168 0 21-05-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 161 0 21-05-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 141 0 21-05-2024

báo cáo hóa học:" Rare ligamentum flavum cyst causing incapacitating lumbar spinal stenosis: Experience with 3 Chinese patients"

4 108 0 21-05-2024

Christmas Meditations on the Twelve Holy Days

173 112 0 21-05-2024

Fecal Incontinence Diagnosis and Treatment - part 8

35 110 0 21-05-2024

MẪU CHỨNG CHỈ QUẢN LÝ VŨ KHÍ, VẬT LIỆU NỔ, CCHT

1 128 0 21-05-2024

Tự học thổi sáo và ngâm thơ part 4

11 158 1 21-05-2024

Điều bạn cần làm để giữ chặt tình yêu

5 115 0 21-05-2024

Kiến thức vượt qua kì thi quốc gia 11

6 106 0 21-05-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7889 2228

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6115 1473

Ebook Chào con ba mẹ đã sẵn sàng

112 3788 1255

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5413 1138

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8385 1132

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3552 656

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3757 544

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10987 531

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4170 523

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4191 483