TAILIEUCHUNG - Báo cáo khoa học: "Detecting Errors in Part-of-Speech Annotation"

We propose a new method for detecting errors in "gold-standard" part-ofspeech annotation. The approach locates errors with high precision based on n-grams occurring in the corpus with multiple taggings. Two further techniques, closed-class analysis and finitestate tagging guide patterns, are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank. | Detecting Errors in Part-of-Speech Annotation Markus Dickinson Department of Linguistics The Ohio State University dickinso@ w. Detmar Meurers Department of Linguistics The Ohio State University dm@ Abstract We propose a new method for detecting errors in gold-standard part-of-speech annotation. The approach locates errors with high precision based on n-grams occuưing in the corpus with multiple taggings. Two further techniques closed-class analysis and finite-state tagging guide patterns are discussed. The success of the three approaches is illustrated for the Wall Street Journal corpus as part of the Penn Treebank. 1 Introduction Part-of-speech pos annotated reference corpora such as the British National Corpus Leech et al. 1994 the Penn Treebank Marcus et al. 1993 or the German Negra Treebank Skut et al. 1997 play an important role for current work in computational linguistics. They provide training material for research on tagging algorithms and they serve as a gold standard for evaluating the performance of such tools. High quality pos-annotated text is also relevant as input for syntactic processing for practical applications such as information extraction and for linguistic research making use of pos-based corpus queries. The gold-standard pos-annotation for such large reference corpora is generally obtained using an automatic tagger to produce a first annotation followed by human post-editing. While Sinclair 1992 provides some arguments for prioritizing a fully automated analysis human post-editing has been shown to significantly reduce the number of pos-annotation errors. Brants 2000 discusses that a single human post-editor reduces the error rate in the STTS annotation of the German Negra corpus produced by the TnT tagger to . Baker 1997 also reports an improvement of around 2 for a similar experiment carried out for an English sample originally tagged with accuracy by the CLAWS tagger. And Leech 1997 reports that .

Mạnh Nghiêm 41 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Detecting Errors in Automatically-Parsed Dependency Relations"

10 84 0

Báo cáo khoa học: "A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English"

8 77 0

Báo cáo khoa học: "Detecting Errors in Discontinuous Structural Annotation"

8 64 0

Báo cáo khoa học: "Detecting Errors in Part-of-Speech Annotation"

8 30 0

Báo cáo khoa học: "From detecting errors to automatically correcting them"

8 51 0

Wireless networks - Lecture 4: Error detecting and correcting techniques

24 19 1

THEME: DETECTING ACCOUNTING ERRORS

7 69 0

Lecture Programming languages (2/e): Chapter 6b - Tucker, Noonan

7 73 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462342 61

Giới thiệu :Lập trình mã nguồn mở

14 26076 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10552 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9843 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8506 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7271 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1073 2 28-12-2024

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 187 2 28-12-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 206 7 28-12-2024

Valve Selection Handbook - Fourth Edition

337 146 2 28-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 148 1 28-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 142 1 28-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 174 1 28-12-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 145 1 28-12-2024

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 148 1 28-12-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 170 1 28-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Ebook Chào con ba mẹ đã sẵn sàng

112 4409 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6290 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3841 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4712 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4510 490