TAILIEUCHUNG - Báo cáo khoa học: "Parsing Noun Phrase Structure with CCG"

Statistical parsing of noun phrase (NP) structure has been hampered by a lack of goldstandard data. This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank.(N (N/N lung) (N (N/N cancer) (N deaths) ) )This structure is correct for most English NPs and is the best solution that doesn’t require manual reannotation. However, the resulting derivations often contain errors. This can be seen in the previous exWe correct these errors in CCGbank using a gold-standard corpus of NP structure, resultample, . | Parsing Noun Phrase Structure with CCG David Vadas and James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia dvadasl james @ Abstract Statistical parsing of noun phrase NP structure has been hampered by a lack of gold-standard data. This is a significant problem for CCGbank where binary branching NP derivations are often incorrect a result of the automatic conversion from the Penn Treebank. We correct these errors in CCGbank using a gold-standard corpus of NP structure resulting in a much more accurate corpus. We also implement novel NER features that generalise the lexical information needed to parse NPs and provide important semantic information. Finally evaluating against DepBank demonstrates the effectiveness of our modified corpus and novel features with an increase in parser performance of . 1 Introduction Internal noun phrase np structure is not recovered by a number of widely-used parsers . Collins 2003 . This is because their training data the Penn Treebank Marcus et al. 1993 does not fully annotate NP structure. The flat structure described by the Penn Treebank can be seen in this example NP NN lung NN cancer NNS deaths CCGbank Hockenmaier and Steedman 2007 is the primary English corpus for Combinatory Categorial Grammar ccg Steedman 2000 and was created by a semi-automatic conversion from the Penn Treebank. However CCG is a binary branching grammar and as such cannot leave N P structure underspecified. Instead all NPs were made rightbranching as shown in this example N N N lung N N N cancer N deaths This structure is correct for most English NPs and is the best solution that doesn t require manual reannotation. However the resulting derivations often contain errors. This can be seen in the previous example where lung cancer should form a constituent but does not. The first contribution of this paper is to correct these CCGbank errors. We apply an automatic conversion process using the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.