TAILIEUCHUNG - Báo cáo khoa học: "Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar"

This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. | Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar Yoshihide Kato1 and Shigeki Matsubara2 information Technology Center Nagoya University 2Graduate School of Information Science Nagoya University Furo-cho Chikusa-ku Nagoya 464-8601 Japan yosihide@ Abstract This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision. 1 Introduction Annotated corpora play an important role in the fields such as theoretical linguistic researches or the development of NLP systems. However they often contain annotation errors which are caused by a manual or semi-manual mark-up process. These errors are problematic for corpus-based researches. To solve this problem several error detection and correction methods have been proposed so far Eskin 2000 Nakagawa and Matsumoto 2002 Dickinson and Meurers 2003a Dickinson and Meurers 2003b Ule and Simov 2004 Murata et al. 2005 Dickinson and Meurers 2005 Boyd et al. 2008 . These methods detect corpus positions which are marked up incorrectly and find the correct labels . pos-tags for those positions. However the methods cannot correct errors in structural annotation. This means that they are insufficient to correct annotation errors in a treebank. This paper proposes a method of correcting errors in structural annotation. Our method is based on a synchronous grammar formalism called synchronous tree substitution grammar STSG Eisner 2003 which defines a tree-to-tree transfor mation. By using an STSG our method transforms parse trees containing errors into the ones whose errors are corrected. The grammar is .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.