TAILIEUCHUNG - Báo cáo khoa học: "Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling"

Active Learning (AL) is typically initialized with a small seed of examples selected randomly. However, when the distribution of classes in the data is skewed, some classes may be missed, resulting in a slow learning progress. Our contribution is twofold: (1) we show that an unsupervised language modeling based technique is effective in selecting rare class examples, and (2) we use this technique for seeding AL and demonstrate that it leads to a higher learning rate. The evaluation is conducted in the context of word sense disambiguation. . | Good Seed Makes a Good Crop Accelerating Active Learning Using Language Modeling Dmitriy Dligach Martha Palmer Department of Computer Science Department of Linguistics University of Colorado at Boulder University of Colorado at Boulder Abstract Active Learning AL is typically initialized with a small seed of examples selected randomly. However when the distribution of classes in the data is skewed some classes may be missed resulting in a slow learning progress. Our contribution is twofold 1 we show that an unsupervised language modeling based technique is effective in selecting rare class examples and 2 we use this technique for seeding AL and demonstrate that it leads to a higher learning rate. The evaluation is conducted in the context of word sense disambiguation. 1 Introduction Active learning AL Settles 2009 has become a popular research field due to its potential benefits it can lead to drastic reductions in the amount of annotation that is necessary for training a highly accurate statistical classifier. Unlike in a random sampling approach where unlabeled data is selected for annotation randomly AL delegates the selection of unlabeled data to the classifier. In a typical AL setup a classifier is trained on a small sample of the data usually selected randomly known as the seed examples. The classifier is subsequently applied to a pool of unlabeled data with the purpose of selecting additional examples that the classifier views as informative. The selected data is annotated and the cycle is repeated allowing the learner to quickly refine the decision boundary between the classes. Unfortunately AL is susceptible to a shortcoming known as the missed cluster effect Schiitze et al. 2006 and its special case called the missed class 6 effect Tomanek et al. 2009 . The missed cluster effect is a consequence of the fact that seed examples influence the direction the learner takes in its exploration of the .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.