TAILIEUCHUNG - Báo cáo khoa học: "Error Mining for Wide-Coverage Grammar Engineering"

Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora, the technique discovers missing, incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of sufﬁx arrays and perfect hash ﬁnite automata allows an efﬁcient implementation. . | Error Mining for Wide-Coverage Grammar Engineering Gertjan van Noord Alfa-informatica University of Groningen POBox716 9700 AS Groningen The Netherlands vannoord@ Abstract Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora the technique discovers missing incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of suffix arrays and perfect hash finite automata allows an efficient implementation. 1 Introduction As we all know hand-crafted linguistic descriptions such as wide-coverage grammars and large scale dictionaries contain mistakes and are incomplete. In the context of parsing people often construct sets of example sentences that the system should be able to parse correctly. If a sentence cannot be parsed it is a clear sign that something is wrong. This technique only works in as far as the problems that might occur have been anticipated. More recently tree-banks have become available and we can apply the parser to the sentences of the tree-bank and compare the resulting parse trees with the gold standard. Such techniques are limited however because treebanks are relatively small. This is a serious problem because the distribution of words is Zipfian there are very many words that occur very infrequently and the same appears to hold for syntactic constructions. In this paper an error mining technique is described which is very effective at automatically discovering systematic mistakes in a parser by using very large but unannotated corpora. The idea is very simple. We run the parser on a large set of sentences and then analyze those sentences

Thi Xuân 79 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Error Mining on Dependency Trees"

9 49 0

Báo cáo khoa học: "Error mining in parsing results"

8 75 0

Báo cáo khoa học: "Error Mining for Wide-Coverage Grammar Engineering"

8 70 0

Mining statistically-solid k-mers for accurate NGS error correction

10 32 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462348 61

Giới thiệu :Lập trình mã nguồn mở

14 26497 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11370 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10557 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9850 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8512 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7844 1803

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7285 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 236 3 05-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 161 3 05-01-2025

Bảng màu theo chữ cái – V

11 174 2 05-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 240 7 05-01-2025

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 154 1 05-01-2025

Bệnh sán lá gan trên gia súc và cách phòng trị

3 167 1 05-01-2025

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 146 1 05-01-2025

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 177 1 05-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 161 1 05-01-2025

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 153 1 05-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7844 1803

Ebook Chào con ba mẹ đã sẵn sàng

112 4424 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6336 1275

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3855 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3926 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4754 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11370 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4529 490