TAILIEUCHUNG - Báo cáo khoa học: "Ranking Algorithms for Named–Entity Extraction: Boosting and the Voted Perceptron"

This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger, the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable, significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train, at some cost in computation on test examples. isting statistical parser, giving significant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms have been applied to natural language generation,. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 489-496. Ranking Algorithms for Named-Entity Extraction Boosting and the Voted Perceptron Michael Collins AT T Labs-Research Florham Park New Jersey. mcollins@ Abstract This paper describes algorithms which rerank the top N hypotheses from a maximum-entropy tagger the application being the recovery of named-entity boundaries in a corpus of web data. The first approach uses a boosting algorithm for ranking problems. The second approach uses the voted perceptron algorithm. Both algorithms give comparable significant improvements over the maximum-entropy baseline. The voted perceptron algorithm can be considerably more efficient to train at some cost in computation on test examples. 1 Introduction Recent work in statistical approaches to parsing and tagging has begun to consider methods which incorporate global features of candidate structures. Examples of such techniques are Markov Random Fields Abney 1997 Della Pietra et al. 1997 Johnson et al. 1999 and boosting algorithms Freund et al. 1998 Collins 2000 Walker et al. 2001 . One appeal of these methods is their flexibility in incorporating features into a model essentially any features which might be useful in discriminating good from bad structures can be included. A second appeal of these methods is that their training criterion is often discriminative attempting to explicitly push the score or probability of the correct structure for each training sentence above the score of competing structures. This discriminative property is shared by the methods of Johnson et al. 1999 Collins 2000 and also the Conditional Random Field methods of Lafferty et al. 2001 . In a previous paper Collins 2000 a boosting algorithm was used to rerank the output from an ex isting statistical parser giving significant improvements in parsing accuracy on Wall Street Journal data. Similar boosting algorithms

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.