Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Detecting Errors in Part-of-Speech Annotation"

Mạnh Nghiêm 41 8 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We propose a new method for detecting errors in "gold-standard" part-ofspeech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finitestate tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank. | Detecting Errors in Part-of-Speech Annotation Markus Dickinson Department of Linguistics The Ohio State University dickinso@ling.osu.edu w. Detmar Meurers Department of Linguistics The Ohio State University dm@ling.osu.edu Abstract We propose a new method for detecting errors in gold-standard part-of-speech annotation. The approach locates errors with high precision based on n-grams occuưing in the corpus with multiple taggings. Two further techniques closed-class analysis and finite-state tagging guide patterns are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank. 1 Introduction Part-of-speech pos annotated reference corpora such as the British National Corpus Leech et al. 1994 the Penn Treebank Marcus et al. 1993 or the German Negra Treebank Skut et al. 1997 play an important role for current work in computational linguistics. They provide training material for research on tagging algorithms and they serve as a gold standard for evaluating the performance of such tools. High quality pos-annotated text is also relevant as input for syntactic processing for practical applications such as information extraction and for linguistic research making use of pos-based corpus queries. The gold-standard pos-annotation for such large reference corpora is generally obtained using an automatic tagger to produce a first annotation followed by human post-editing. While Sinclair 1992 provides some arguments for prioritizing a fully automated analysis human post-editing has been shown to significantly reduce the number of pos-annotation errors. Brants 2000 discusses that a single human post-editor reduces the 3.3 error rate in the STTS annotation of the German Negra corpus produced by the TnT tagger to 1.2 . Baker 1997 also reports an improvement of around 2 for a similar experiment carried out for an English sample originally tagged with 96.95 accuracy by the CLAWS tagger. And Leech 1997 reports that .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A System for Detecting Subgroups in Online Discussions"

Báo cáo khoa học: "Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents"

Báo cáo khoa học: "Detecting Errors in Automatically-Parsed Dependency Relations"

Báo cáo khoa học: "Detecting Experiences from Weblogs"

Báo cáo khoa học: "Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts"

Báo cáo khoa học: "Detecting Compositionality in Multi-Word Expressions"

Báo cáo khoa học: "A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English"

Báo cáo khoa học: "Detecting Erroneous Sentences using Automatically Mined Sequential Patterns"

Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features"

Báo cáo khoa học: "Detecting Errors in Discontinuous Structural Annotation"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.