TAILIEUCHUNG - Báo cáo khoa học: "An All-Subtrees Approach to Unsupervised Parsing"

We investigate generalizations of the allsubtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator which is known to be statistically consistent. We report state-ofthe-art results on English (WSJ), German (NEGRA) and Chinese (CTB) data. . | An All-Subtrees Approach to Unsupervised Parsing Rens Bod School of Computer Science University of St Andrews North Haugh St Andrews KY16 9sX Scotland UK rb@ Abstract We investigate generalizations of the allsubtrees DOP approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use a large random subset of all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator which is known to be statistically consistent. We report state-of-the-art results on English WSJ German NEGRA and Chinese CTB data. To the best of our knowledge this is the first paper which tests a maximum likelihood estimator for DOP on the Wall Street Journal leading to the surprising result that an unsupervised parsing model beats a widely used supervised model a treebank PCFG . 1 Introduction The problem of bootstrapping syntactic structure from unlabeled data has regained considerable interest. While supervised parsers suffer from shortage of hand-annotated data unsupervised parsers operate with unlabeled raw data of which unlimited quantities are available. During the last few years there has been steady progress in the field. Where van Zaanen 2000 achieved unlabeled f-score on ATIS word strings Clark 2001 reports on the same data and Klein and Manning 2002 obtain f-score on ATIS part-of-speech strings using a constituent-context model called CCM. On Penn Wall Street Journal p-o-s-strings 10 WSJ10 Klein and Manning 2002 report unlabeled f-score with CCM. And the hybrid approach of Klein and Manning 2004 which combines constituency and dependency models yields f-score. Bod 2006 shows that a further improvement on the WSJ10 can be achieved by an unsupervised generalization of the all-subtrees approach known as Data-Oriented Parsing DOP . This unsupervised DOP model coined .

Thiên Lan 76 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461867 55

Giới thiệu :Lập trình mã nguồn mở

14 22643 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10066 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9519 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8238 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6687 253

Vật lý hạt cơ bản (1)

29 5770 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 312 1 27-04-2024

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 235 0 27-04-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 246 1 27-04-2024

extremetech Hacking Firefox phần 7

46 187 0 27-04-2024

Management and Services Part 1

10 156 0 27-04-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 175 0 27-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 184 0 27-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 27-04-2024

Đề tài: Tìm hiểu một số yêu cầu đặt ra với một phòng thu âm, để đảm bảo chất lượng âm thanh trong sản phẩm đa phương tiện

8 160 1 27-04-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 144 3 27-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5737 1368

Ebook Chào con ba mẹ đã sẵn sàng

112 3767 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5319 1136

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3499 643

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3684 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4046 515

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4128 480