TAILIEUCHUNG - Báo cáo khoa học: "Bootstrapping"

This paper refines the analysis of cotraining, defines and evaluates a new co-training algorithm that has theoretical justification, gives a theoretical justification for the Yarowsky algorithm, and shows that co-training and the Yarowsky algorithm are based on different independence assumptions. agreement on unlabeled data, nor, for that matter, does the co-training algorithm directly seek to find classifiers that agree on unlabeled data. Moreover, the suggestion that the Yarowsky algorithm is a special case of co-training is based on an incidental detail of the particular application that Yarowsky considers, not on the properties of the core algorithm. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 360-367. Bootstrapping Steven Abney AT T Laboratories - Research 180 Park Avenue Florham Park NJ USA 07932 Abstract This paper refines the analysis of cotraining defines and evaluates a new co-training algorithm that has theoretical justification gives a theoretical justification for the Yarowsky algorithm and shows that co-training and the Yarowsky algorithm are based on different independence assumptions. 1 Overview The term bootstrapping here refers to a problem setting in which one is given a small set of labeled data and a large set of unlabeled data and the task is to induce a classifier. The plenitude of unlabeled natural language data and the paucity of labeled data have made bootstrapping a topic of interest in computational linguistics. Current work has been spurred by two papers Yarowsky 1995 and Blum and Mitchell 1998 . Blum and Mitchell propose a conditional independence assumption to account for the efficacy of their algorithm called co-training and they give a proof based on that conditional independence assumption. They also give an intuitive explanation of why co-training works in terms of maximizing agreement on unlabeled data between classifiers based on different views of the data. Finally they suggest that the Yarowsky algorithm is a special case of the co-training algorithm. The Blum and Mitchell paper has been very influential but it has some shortcomings. The proof they give does not actually apply directly to the co-training algorithm nor does it directly justify the intuitive account in terms of classifier agreement on unlabeled data nor for that matter does the co-training algorithm directly seek to find classifiers that agree on unlabeled data. Moreover the suggestion that the Yarowsky algorithm is a special case of co-training is based on an incidental detail of the particular application that Yarowsky considers not on

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.