TAILIEUCHUNG - Báo cáo khoa học: "An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation"

We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 505-512. An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation Manabu Sassano Fujitsu Laboratories Ltd. 4-1-1 Kamikodanaka Nakahara-ku Kawasaki 211-8588 Japan sassano@ Abstract We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve accuracy the proposed technique needs of labeled examples that are required when using the previous technique and only of labeled examples with random sampling. 1 Introduction Corpus-based supervised learning is now a standard approach to achieve high-performance in natural language processing. However the weakness of supervised learning approach is to need an annotated corpus the size of which is reasonably large. Even if we have a good supervised-learning method we cannot get high-performance without an annotated corpus. The problem is that corpus annotation is labour intensive and very expensive. In order to overcome this some unsupervised learning methods and minimally-supervised methods . Yarowsky 1995 Yarowsky and Wicentowski 2000 have been proposed. However such methods usually depend on tasks or domains and their performance often does not match one with a supervised learning method. Another promising .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.