TAILIEUCHUNG - Báo cáo khoa học: "Creating a Corpus of Parse-Annotated Questions"

however these are often based on a specific text type or genre, . financial newspaper text (the Penn-II Treebank (Marcus et al., 1993)). This can limit the applicability of grammatical resources induced from treebanks in that such resources underperform when used on a different type of text or for a specific task. In this paper we present work on creating QuestionBank, a treebank of parse-annotated questions, which can be used as a supplementary training resource to allow parsers to accurately parse questions (as well as other text). . | QuestionBank Creating a Corpus of Parse-Annotated Questions John Judge1 Aoife Cahill1 and Josef van Genabith1 2 National Centre for Language Technology and School of Computing Dublin City University Dublin Ireland 2IBM Dublin Center for Advanced Studies IBM Dublin Ireland jjudge acahill josef @ Abstract This paper describes the development of QuestionBank a corpus of 4000 parse-annotated questions for i use in training parsers employed in QA and ii evaluation of question parsing. We present a series of experiments to investigate the effectiveness of QuestionBank as both an exclusive and supplementary training resource for a state-of-the-art parser in parsing both question and non-question test sets. We introduce a new method for recovering empty nodes and their antecedents capturing long distance dependencies from parser output in CFG trees using LFG f-structure reentrancies. Our main findings are i using QuestionBank training data improves parser performance to labelled bracketing f-score an increase of almost 11 over the baseline ii back-testing experiments on nonquestion data Penn-II WSJ Section 23 shows that the retrained parser does not suffer a performance drop on non-question material iii ablation experiments show that the size of training material provided by QuestionBank is sufficient to achieve optimal results iv our method for recovering empty nodes captures long distance dependencies in questions from the ATIS corpus with high precision and low recall . In summary QuestionBank provides a useful new resource in parser-based QA research. 1 Introduction Parse-annotated corpora treebanks are crucial for developing machine learning and statistics-based parsing resources for a given language or task. Large treebanks are available for major languages however these are often based on a specific text type or genre . financial newspaper text the Penn-II Treebank Marcus et al. 1993 . This can limit the applicability of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.