TAILIEUCHUNG - Báo cáo khoa học: "Adding Noun Phrase Structure to the Penn Treebank"

The Penn Treebank does not annotate within base noun phrases (NPs), committing only to flat structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs. This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliability of our annotations. Finally, we use this resource to determine NP structure using several statistical approaches, thus demonstrating the utility of the corpus. This adds detail to the Penn Treebank that is necessary for many NLP. | Adding Noun Phrase Structure to the Penn Treebank David Vadas and James R. Curran School of Information Technologies University of Sydney NSW 2006 Australia dvadasl james @ Abstract The Penn Treebank does not annotate within base noun phrases NPs committing only to flat structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs. This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliability of our annotations. Finally we use this resource to determine NP structure using several statistical approaches thus demonstrating the utility of the corpus. This adds detail to the Penn Treebank that is necessary for many NLP applications. 1 Introduction The Penn Treebank Marcus et al. 1993 is perhaps the most influential resource in Natural Language Processing NLP . It is used as a standard training and evaluation corpus in many syntactic analysis tasks ranging from part of speech POS tagging and chunking to full parsing. Unfortunately the Penn Treebank does not annotate the internal structure of base noun phrases instead leaving them flat. This significantly simplified and sped up the manual annotation process. Therefore any system trained on Penn Treebank data will be unable to model the syntactic and semantic structure inside base-NPs. 240 The following NP is an example of the flat structure of base-NPs within the Penn Treebank NP NNP Air NNP Force NN contract Air Force is a specific entity and should form a separate constituent underneath the NP as in our new annotation scheme NP NML NNP Air NNP Force NN contract We use NML to specify that Air Force together is a nominal modifier of contract. Adding this annotation better represents the true syntactic and semantic structure which will improve the performance of downstream NLP systems. Our main contribution is a .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.