TAILIEUCHUNG - Báo cáo khoa học: "Simple semi-supervised training of part-of-speech taggers"

Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classiﬁcation task. This simpliﬁes semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of with SVMTool (Gimenez and Marquez, 2004). | Simple semi-supervised training of part-of-speech taggers Anders S0gaard Center for Language Technology University of Copenhagen soegaard@ Abstract Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training Li and Zhou 2005 and disagreement-based co-training. On the Wall Street Journal we obtain an error reduction of with SVMTool Gimenez and Marquez 2004 . 1 Introduction Semi-supervised part-of-speech POS tagging is relatively rare and the main reason seems to be that results have mostly been negative. Meri-aldo 1994 in a now famous negative result attempted to improve HMM POS tagging by expectation maximization with unlabeled data. Clark et al. 2003 reported positive results with little labeled training data but negative results when the amount of labeled training data increased the same seems to be the case in Wang et al. 2007 who use co-training of two diverse POS taggers. Huang et al. 2009 present positive results for self-training a simple bigram POS tagger but results are considerably below state-of-the-art. Recently researchers have explored alternative methods. Suzuki and Isozaki 2008 introduce a semi-supervised extension of conditional random fields that combines supervised and unsupervised probability models by so-called MDF parameter estimation which reduces error on Wall Street Journal WSJ standard splits by about 7 relative to their supervised baseline. Spoustova et al. 2009 use a new pool of unlabeled data tagged by an ensemble of state-of-the-art taggers in every training step of an averaged perceptron POS tagger with 4-5 error reduction. Finally S0gaard 2009 stacks a POS tagger on an unsupervised clustering algorithm trained on large amounts of unlabeled data with mixed results. This work combines a new

Bảo Giang 80 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462285 61

Giới thiệu :Lập trình mã nguồn mở

14 24844 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10508 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9785 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8463 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7465 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7185 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 213 3 23-11-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 149 1 23-11-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 132 2 23-11-2024

Bảng màu theo chữ cái – V

11 153 2 23-11-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 152 1 23-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 139 1 23-11-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 139 1 23-11-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 135 1 23-11-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 198 1 23-11-2024

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 141 1 23-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7465 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6149 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3786 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4614 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11281 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4447 490