TAILIEUCHUNG - Báo cáo khoa học: "Detecting Errors in Automatically-Parsed Dependency Relations"

We outline different methods to detect errors in automatically-parsed dependency corpora, by comparing so-called dependency rules to their representation in the training data and flagging anomalous ones. By comparing each new rule to every relevant rule from training, we can identify parts of parse trees which are likely erroneous. Even the relatively simple methods of comparison we propose show promise for speeding up the annotation process. | Detecting Errors in Automatically-Parsed Dependency Relations Markus Dickinson Indiana University md7@ Abstract We outline different methods to detect errors in automatically-parsed dependency corpora by comparing so-called dependency rules to their representation in the training data and flagging anomalous ones. By comparing each new rule to every relevant rule from training we can identify parts of parse trees which are likely erroneous. Even the relatively simple methods of comparison we propose show promise for speeding up the annotation process. 1 Introduction and Motivation Given the need for high-quality dependency parses in applications such as statistical machine translation Xu et al. 2009 natural language generation Wan et al. 2009 and text summarization evaluation Owczarzak 2009 there is a corresponding need for high-quality dependency annotation for the training and evaluation of dependency parsers Buchholz and Marsi 2006 . Furthermore parsing accuracy degrades unless sufficient amounts of labeled training data from the same domain are available . Gildea 2001 Sekine 1997 and thus we need larger and more varied annotated treebanks covering a wide range of domains. However there is a bottleneck in obtaining annotation due to the need for manual intervention in annotating a treebank. One approach is to develop automatically-parsed corpora van Noord and Bouma 2009 but a natural disadvantage with such data is that it contains parsing errors. Identifying the most problematic parses for human post-processing could combine the benefits of automatic and manual annotation by allowing a human annotator to efficiently correct automatic errors. We thus set out in this paper to detect errors in automatically-parsed data. If annotated corpora are to grow in scale and retain a high quality annotation errors which arise from automatic processing must be minimized as errors have a negative impact on training and evaluation of NLP technology see discussion .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.