Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Error Mining for Wide-Coverage Grammar Engineering"

Thi Xuân 79 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora, the technique discovers missing, incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of sufﬁx arrays and perfect hash ﬁnite automata allows an efﬁcient implementation. . | Error Mining for Wide-Coverage Grammar Engineering Gertjan van Noord Alfa-informatica University of Groningen POBox716 9700 AS Groningen The Netherlands vannoord@let.rug.nl Abstract Parsing systems which rely on hand-coded linguistic descriptions can only perform adequately in as far as these descriptions are correct and complete. The paper describes an error mining technique to discover problems in hand-coded linguistic descriptions for parsing such as grammars and lexicons. By analysing parse results for very large unannotated corpora the technique discovers missing incorrect or incomplete linguistic descriptions. The technique uses the frequency of n-grams of words for arbitrary values of n. It is shown how a new combination of suffix arrays and perfect hash finite automata allows an efficient implementation. 1 Introduction As we all know hand-crafted linguistic descriptions such as wide-coverage grammars and large scale dictionaries contain mistakes and are incomplete. In the context of parsing people often construct sets of example sentences that the system should be able to parse correctly. If a sentence cannot be parsed it is a clear sign that something is wrong. This technique only works in as far as the problems that might occur have been anticipated. More recently tree-banks have become available and we can apply the parser to the sentences of the tree-bank and compare the resulting parse trees with the gold standard. Such techniques are limited however because treebanks are relatively small. This is a serious problem because the distribution of words is Zipfian there are very many words that occur very infrequently and the same appears to hold for syntactic constructions. In this paper an error mining technique is described which is very effective at automatically discovering systematic mistakes in a parser by using very large but unannotated corpora. The idea is very simple. We run the parser on a large set of sentences and then analyze those sentences

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation"

Báo cáo khoa học: "Interactive ASR Error Correction for Touchscreen Devices"

The study: Spoken error correction in Thanh Binh 1 high school a case study

Báo cáo hóa học: " Error-Resilient Unequal Error Protection of Fine Granularity Scalable Video Bitstreams"

Báo cáo khoa học: "A Graphical Interface for MT Evaluation and Error Analysis"

Báo cáo khoa học: "Tense and Aspect Error Correction for ESL Learners Using Global Context"

Báo cáo khoa học: "A Meta Learning Approach to Grammatical Error Correction"

Báo cáo khoa học: "Error Mining on Dependency Trees"

Báo cáo khoa học: "Learning Phrase-Based Spelling Error Models from Clickthrough Data"

Báo cáo khoa học: "Error Detection for Statistical Machine Translation Using Linguistic Features"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.