TAILIEUCHUNG - Báo cáo khoa học: "Contrastive Estimation: Training Log-Linear Models on Unlabeled Data∗"

Conditional random ﬁelds (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efﬁcient. . | Contrastive Estimation Training Log-Linear Models on Unlabeled Data Noah A. Smith and Jason Eisner Department of Computer Science Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21218 USA nasmith jason @ Abstract Conditional random fields Lafferty et al. 2001 are quite effective at sequence labeling tasks like shallow parsing Sha and Pereira 2003 and named-entity extraction McCallum and Li 2003 . CRFs are log-linear allowing the incorporation of arbitrary features into the model. To train on unlabeled data we require unsupervised estimation methods for log-linear models few exist. We describe a novel approach contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem POS tagging given a tagging dictionary and unlabeled text contrastive estimation outperforms EM with the same feature set is more robust to degradations of the dictionary and can largely recover by modeling additional features. 1 Introduction Finding linguistic structure in raw text is not easy. The classical forward-backward and inside-outside algorithms try to guide probabilistic models to discover structure in text but they tend to get stuck in local maxima Charniak 1993 . Even when they avoid local maxima . through clever initialization they typically deviate from human ideas of whatthe right structure is Merialdo 1994 . One strategy is to incorporate domain knowledge into the model s structure. Instead of blind HMMs or PCFGs one could use models whose features This work was supported by a Fannie and John Hertz Foundation fellowship to the first author and NSFITR grant IIS-0313193 to the second author. The views expressed are not necessarily endorsed by the sponsors. The authors also thank three anonymous ACL reviewers for helpful comments colleagues at JHU CLSP especially David Smith and Roy Tromble and Miles .

Bích Hải 69 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Contrastive Estimation: Training Log-Linear Models on Unlabeled Data∗"

9 46 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26634 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10566 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9854 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8518 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7894 1813

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7289 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 395 3 07-01-2025

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 241 3 07-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 162 3 07-01-2025

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 170 1 07-01-2025

Bảng màu theo chữ cái – V

11 177 2 07-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1078 2 07-01-2025

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 182 2 07-01-2025

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 170 1 07-01-2025

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 151 1 07-01-2025

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 156 1 07-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7894 1813

Ebook Chào con ba mẹ đã sẵn sàng

112 4435 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6352 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3858 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4768 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4533 490