TAILIEUCHUNG - Báo cáo khoa học: "Using Derivation Trees for Treebank Error Detection"

This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. . | Using Derivation Trees for Treebank Error Detection Seth Kulick and Ann Bies and Justin Mott Linguistic Data Consortium University of Pennsylvania 3600 Market Street Suite 810 Philadelphia PA 19104 skulick bies jmott @ Abstract This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. 1 Introduction The internal consistency of the annotation in a treebank is crucial in order to provide reliable training and testing data for parsers and linguistic research. Treebank annotation consisting of syntactic structure with words as the terminals is by its nature more complex and thus more prone to error than other annotation tasks such as part-of-speech tagging. Recent work has therefore focused on the importance of detecting errors in the treebank Green and Manning 2010 and methods for finding such errors automatically . Dickinson and Meurers 2003b Boyd et al. 2007 Kato and Matsubara 2010 . We present here a new approach to this problem that builds upon Dickinson and Meurers 2003b by integrating the perspective on treebank consistency checking and search in Kulick and Bies 2010 . The approach in Dickinson and Meurers 2003b has certain limitations and complications that are inherent in examining only strings of words. To over- 693 come these problems we recast the search as one of searching for inconsistently-used elementary trees in a Tree Adjoining Grammar-based form of the treebank. This allows consistency checking to be based on structural locality instead of n-grams resulting in improved .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.