TAILIEUCHUNG - Báo cáo khoa học: "Creating a CCGbank and a wide-coverage CCG lexicon for German"

We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94% of all known tokens in unseen text. | Creating a CCGbank and a wide-coverage CCG lexicon for German Julia Hockenmaier Institute for Research in Cognitive Science University of Pennsylvania Philadelphia PA 19104 USA juliahr@ Abstract We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46 628 derivations covering 95 of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94 of all known tokens in unseen text. 1 Introduction A number of wide-coverage TAG CCG LFG and HPSG grammars Xia 1999 Chen et al. 2005 Hockenmaier and Steedman 2002a O Donovan et al. 2005 Miyao et al. 2004 have been extracted from the Penn Treebank Marcus et al. 1993 and have enabled the creation of wide-coverage parsers for English which recover local and non-local dependencies that approximate the underlying predicate-argument structure Hocken-maier and Steedman 2002b Clark and Curran 2004 Miyao and Tsujii 2005 Shen and Joshi 2005 . However many corpora Bohomva et al. 2003 Skut et al. 1997 Brants et al. 2002 use dependency graphs or other representations and the extraction algorithms that have been developed for Penn Treebank style corpora may not be immediately applicable to this representation. As a consequence research on statistical parsing with deep grammars has largely been confined to English. Free-word order languages typically pose greater challenges for syntactic theories Rambow 1994 and the richer inflectional morphology of these languages creates additional problems both for the coverage of lexicalized formalisms such as CCG or TAG and for the usefulness of dependency counts extracted from the training data. On the other hand formalisms such as CCG and TAG are particularly suited to capture the cross ing dependencies that arise in languages such as Dutch or German and by choosing an appropriate linguistic representation some of these .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.