TAILIEUCHUNG - Báo cáo khoa học: "XML-Based Data Preparation for Robust Deep Parsing"

We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the ‘messiness’ in real language data and improve parse performance. . | XML-Based Data Preparation for Robust Deep Parsing Claire Grover and Alex Lascarides Division of Informatics The University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK @ Abstract We describe the use of XML tokenisa-tion tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing helping to ameliorate the messiness in real language data and improve parse performance. 1 Introduction The field of parsing technology currently has two distinct strands of research with few points of contact between them. On the one hand there is thriving research on shallow parsing chunking and induction of statistical syntactic analysers from treebanks and on the other hand there are systems which use hand-crafted grammars which provide both syntactic and semantic coverage. Shallow approaches have good coverage on corpus data but extensions to semantic analysis are still in a relative infancy. The deep strand of research has two main problems inadequate coverage and a lack of reliable techniques to select the correct parse. In this paper we describe ongoing research which uses hybrid technologies to address the problem of inadequate coverage of a deep parsing system. In Section 2 we describe how we have modified an existing hand-crafted grammar s look-up procedure to utilise part-of-speech pos tag information thereby ameliorating the lexical information shortfall. In Section 3 we describe how we combine a variety of existing NLP tools to pre-process real data up to the point where a hand-crafted grammar can start to be .

TÀI LIỆU MỚI ĐĂNG
65    131    1    30-11-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.