Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques"

Tạ Hiền 52 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique. From results of our experiments, our method showed reasonably comparable performance compared with a supervised. | Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques Youngjoong Ko Dept. of Computer Science Sogang Univ. Sinsu-dong 1 Mapo-gu Seoul 121-742 Korea kyj@nlpzodiac.sogang.ac.kr Abstract A wide range of supervised learning algorithms has been applied to Text Categorization. However the supervised learning approaches have some problems. One of them is that they require a large often prohibitive number of labeled training documents for accurate learning. Generally acquiring class labels for training data is costly while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique. From results of our experiments our method showed reasonably comparable performance compared with a supervised method. If our method is used in a text categorization task building text categorization systems will become significantly faster and less expensive. 1 Introduction Text categorization is the task of classifying documents into a certain number of pre-defined categories. Many supervised learning algorithms have been applied to this area. These algorithms today are reasonably successful when provided with enough labeled or annotated training examples. For example there are Naive Bayes McCallum and Nigam 1998 Rocchio Lewis et al. 1996 Nearest Neighbor CNN Yang et al. 2002 TCFP Ko and Seo 2002 and Support Vector Machine SVM Joachims 1998 . However the supervised learning approach has some difficulties. One key difficulty is that it requires a large often prohibitive number of labeled training data for accurate learning. Since a labeling task must be done manually it is a painfully time-consuming process. Furthermore since the application area of text categorization has diversified from newswire articles and web pages to E-mails and newsgroup postings it is also a difficult task to

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning"

Báo cáo khoa học: "Learning Better Data Representation using Inference-Driven Metric Learning"

Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

B.A Thesis: English major students’ difficulties and expectations in learning written translation at Dong Thap university

Báo cáo đề tài nghiên cứu khoa học cấp trường: Áp dụng mô hình học tập Blended Learning trong giảng dạy học phần Basic IELTS 1 cho sinh viên theo chương trình đào tạo chất lượng cao năm thứ nhất trường Đại học Thương mại

Báo cáo đề tài nghiên cứu khoa học cấp trường: Nâng cao động lực học tiếng Anh cho sinh viên thông qua phương pháp học theo dự án (project-based learning)

Báo cáo đề tài nghiên cứu khoa học cấp trường: Nghiên cứu một số thuật toán học máy (machine learning) ứng dụng cho bài toán xác định các chủ đề quan tâm của khách hàng trực tuyến

Báo cáo khoa học: "Applications of GPC Rules and Character Structures in Games for Learning Chinese Characters"

Báo cáo khoa học: "Learning and Translating by Machines"

Báo cáo khoa học: "Discriminative Learning for Joint Template Filling"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.