TAILIEUCHUNG - Báo cáo khoa học: "Learning Efficient Parsing"

A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora. | Learning Efficient Parsing Gertjan van Noord University of Groningen Abstract A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences the parser learns which parse steps can be filtered without significant loss in parsing accuracy but with an important increase in parsing efficiency. An interesting characteristic of our approach is that it is self-learning in the sense that it uses unannotated corpora. 1 Introduction We consider wide-coverage high-accuracy parsing systems such as Alpino a parser for Dutch which contains a grammar based on HPSG and a maximum entropy disambiguation component trained on a treebank. Even if such parsing systems now obtain satisfactory accuracy for a variety of text types a drawback concerns the computational properties of such parsers they typically require lots of memory and are often very slow for longer and very ambiguous sentences. We present a very simple fairly general corpus-based method to improve upon the practical efficiency of such parsers. We use the accurate slow parser to parse many unannotated input sentences. For each sentence we keep track of sequences of derivation steps that were required to find the best parse of that sentence . the parse that obtained the best score highest probability according to the parser itself . Given a large set of successful derivation step sequences we experimented with a variety of simple heuristics to filter unpromising derivation steps. A heuristic that works remarkably well simply states that for a new input sentence the parser can only consider derivation step sequences in which any sub-sequence of length N has been observed at least once in the training data. Experimental results are provided for various heuristics and amounts of training data. It is hard to compare fast accurate parsers with slow .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.