TAILIEUCHUNG - Báo cáo khoa học: "XML-Based Data Preparation for Robust Deep Parsing"

We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing, helping to ameliorate the ‘messiness’ in real language data and improve parse performance. . | XML-Based Data Preparation for Robust Deep Parsing Claire Grover and Alex Lascarides Division of Informatics The University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW UK @ Abstract We describe the use of XML tokenisa-tion tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approach to pre-processing helping to ameliorate the messiness in real language data and improve parse performance. 1 Introduction The field of parsing technology currently has two distinct strands of research with few points of contact between them. On the one hand there is thriving research on shallow parsing chunking and induction of statistical syntactic analysers from treebanks and on the other hand there are systems which use hand-crafted grammars which provide both syntactic and semantic coverage. Shallow approaches have good coverage on corpus data but extensions to semantic analysis are still in a relative infancy. The deep strand of research has two main problems inadequate coverage and a lack of reliable techniques to select the correct parse. In this paper we describe ongoing research which uses hybrid technologies to address the problem of inadequate coverage of a deep parsing system. In Section 2 we describe how we have modified an existing hand-crafted grammar s look-up procedure to utilise part-of-speech pos tag information thereby ameliorating the lexical information shortfall. In Section 3 we describe how we combine a variety of existing NLP tools to pre-process real data up to the point where a hand-crafted grammar can start to be .

Thái Sơn 62 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462346 61

Giới thiệu :Lập trình mã nguồn mở

14 26405 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11366 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10555 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9848 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8895 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8511 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7817 1802

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7284 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 394 3 03-01-2025

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 283 4 03-01-2025

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 233 3 03-01-2025

Đóng mới oto 8 chỗ ngồi part 9

10 182 3 03-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 160 3 03-01-2025

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 173 2 03-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 161 1 03-01-2025

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 162 1 03-01-2025

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 155 3 03-01-2025

Word Games with English 1

65 144 1 03-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8104 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7817 1802

Ebook Chào con ba mẹ đã sẵn sàng

112 4417 1375

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6334 1275

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8895 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3851 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3925 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4742 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11366 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4518 490