TAILIEUCHUNG - Báo cáo khoa học: "Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank"

In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs. | Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank Ruth O Donovan Michael Burke Aoife Cahill Josef van Genabith Andy Way National Centre for Language Technology and School of Computing Dublin City University Glasnevin Dublin 9 Ireland rodonovan mburke acahill j osef away @ Abstract In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames LFG semantic forms traditional CFG categorybased subcategorisation frames as well as mixed function category-based frames with or without preposition information for obliques and particle information for particle verbs. Our approach does not predefine frames associates probabilities with frames conditional on the lemma distinguishes between active and passive frames and fully reflects the effects of long-distance dependencies in the source data structures. We extract 3586 verb lemmas 14348 semantic form types an average of 4 per lemma with 577 frame types. We present a large-scale evaluation of the complete set of forms extracted against the full COMLEX resource. 1 Introduction Lexical resources are crucial in the construction of wide-coverage computational systems based on modern syntactic theories . LFG HPSG CCG LTAG etc. . However as manual construction of such lexical resources is time-consuming error-prone expensive and rarely ever complete it is often the case that limitations of NLP systems based on lexicalised approaches are due to bottlenecks in the lexicon component. Given this research on automating lexical acquisition for lexically-based NLP systems is a particularly important issue. In this paper we present an approach to automating subcategorisation frame acquisition for LFG Kaplan and Bresnan 1982 . grammatical function-based systems. LFG has two levels of structural representation c .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.