TAILIEUCHUNG - Báo cáo khoa học: "Error Mining on Dependency Trees"

In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. | Error Mining on Dependency Trees Claire Gardent Shashi Narayan CNRS LORIA UMR 7503 Universite de Lorraine LORIA UMR 7503 Vandoeuvre-les-Nancy F-54500 France Villers-les-Nancy F-54600 France Abstract In recent years error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system grammar lexicon but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few id-iosyncrasies error in the input data. 1 Introduction In recent years error mining techniques have been developed to help identify the most likely sources of parsing failure van Noord 2004 Sagot and de la Clergerie 2006 de Kok et al. 2009 . First the input data text is separated into two subcorpora a corpus of sentences that could be parsed PASS and a corpus of sentences that failed to be parsed FAIL . For each n-gram of words and or part of speech tag occurring in the corpus to be parsed a suspicion rate is then computed which in essence captures the likelihood that this n-gram causes parsing to fail. These error mining techniques have been applied with good results on parsing output and shown to help improve the large scale symbolic grammars and 592 lexicons used by the parser. However the techniques they use . suffix arrays to enumerate and count n-grams builds on the sequential nature of a text corpus and cannot easily extend to structured data. There are some NLP applications though where the processed data is .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.