TAILIEUCHUNG - Báo cáo khoa học: "Learning Efﬁcient Parsing"

A corpus-based technique is described to improve the efﬁciency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be ﬁltered without signiﬁcant loss in parsing accuracy, but with an important increase in parsing efﬁciency. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora. | Learning Efficient Parsing Gertjan van Noord University of Groningen Abstract A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences the parser learns which parse steps can be filtered without significant loss in parsing accuracy but with an important increase in parsing efficiency. An interesting characteristic of our approach is that it is self-learning in the sense that it uses unannotated corpora. 1 Introduction We consider wide-coverage high-accuracy parsing systems such as Alpino a parser for Dutch which contains a grammar based on HPSG and a maximum entropy disambiguation component trained on a treebank. Even if such parsing systems now obtain satisfactory accuracy for a variety of text types a drawback concerns the computational properties of such parsers they typically require lots of memory and are often very slow for longer and very ambiguous sentences. We present a very simple fairly general corpus-based method to improve upon the practical efficiency of such parsers. We use the accurate slow parser to parse many unannotated input sentences. For each sentence we keep track of sequences of derivation steps that were required to find the best parse of that sentence . the parse that obtained the best score highest probability according to the parser itself . Given a large set of successful derivation step sequences we experimented with a variety of simple heuristics to filter unpromising derivation steps. A heuristic that works remarkably well simply states that for a new input sentence the parser can only consider derivation step sequences in which any sub-sequence of length N has been observed at least once in the training data. Experimental results are provided for various heuristics and amounts of training data. It is hard to compare fast accurate parsers with slow .

Khánh Quyên 40 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Efﬁcient Tree-based Approximation for Entailment Graph Learning"

9 55 0

Báo cáo khoa học: "Learning Efﬁcient Parsing"

9 34 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461992 55

Giới thiệu :Lập trình mã nguồn mở

14 23360 68

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11036 533

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10250 453

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9594 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8472 1141

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8314 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6899 257

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6346 1537

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

extremetech Hacking BlackBerry phần 9

31 267 0 03-06-2024

Bơm máy nén quạt trong công nghiệp part 8

20 221 3 03-06-2024

GIÁO TRÌNH VI XỬ LÝ 1 - CHƯƠNG 5. LẬP TRÌNH CHO VI ĐIỀU KHIỂN 80C51

23 123 1 03-06-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 169 3 03-06-2024

Quy Trình Canh Tác Cây Bông Vải

8 121 0 03-06-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 135 0 03-06-2024

Anh văn TOEFL Vocabulary-008

8 105 0 03-06-2024

The Constituents of Medicinal Plants

185 114 0 03-06-2024

báo cáo hóa học:" A decade of modelling research yields considerable evidence for the importance of concurrency: a response to Sawers and Stillwaggon"

7 99 0 03-06-2024

Giáo trình báo chí điều tra part 4

44 110 0 03-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6346 1537

Ebook Chào con ba mẹ đã sẵn sàng

112 3897 1281

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5515 1149

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8472 1141

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3585 658

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3787 570

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11036 533

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4230 527

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4239 483