TAILIEUCHUNG - Báo cáo khoa học: "An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation"

We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 505-512. An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation Manabu Sassano Fujitsu Laboratories Ltd. 4-1-1 Kamikodanaka Nakahara-ku Kawasaki 211-8588 Japan sassano@ Abstract We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve accuracy the proposed technique needs of labeled examples that are required when using the previous technique and only of labeled examples with random sampling. 1 Introduction Corpus-based supervised learning is now a standard approach to achieve high-performance in natural language processing. However the weakness of supervised learning approach is to need an annotated corpus the size of which is reasonably large. Even if we have a good supervised-learning method we cannot get high-performance without an annotated corpus. The problem is that corpus annotation is labour intensive and very expensive. In order to overcome this some unsupervised learning methods and minimally-supervised methods . Yarowsky 1995 Yarowsky and Wicentowski 2000 have been proposed. However such methods usually depend on tasks or domains and their performance often does not match one with a supervised learning method. Another promising .

Minh Thiện 71 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Interactive control capability, effective organizational learning and firm performance: An empirical study of milling and metal industry in Tegal

10 75 1

Economic growth and macro variables in india: An empirical study

18 67 0

Strategic knowledge management, innovation and firm performance: An empirical study in Vietnamese firms

14 61 0

Master of Business Administration: Study on effective value chain management of Public and Private Schools; Empirical case study from South India

69 107 2

Measuring the Statutory and Regulatory Constraints on DoD Acquisition - Research Design for an Empirical Study

1 63 0

Báo cáo khoa học: "An Empirical Study of Chinese Chunking"

8 57 0

Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

4 64 0

Báo cáo khoa học: "An Extensive Empirical Study of Collocation Extraction Methods"

6 81 0

Báo cáo khoa học: "An Empirical Study of Information Synthesis Tasks"

8 54 0

Báo cáo khoa học: "An Empirical Study of the Influence of Argument Conciseness on Argument Effectiveness"

8 59 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 391 3 23-12-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 188 5 23-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 161 1 23-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 144 2 23-12-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 180 3 23-12-2024

Bảng màu theo chữ cái – V

11 163 2 23-12-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 156 1 23-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 157 1 23-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 153 4 23-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 177 2 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490