TAILIEUCHUNG - Báo cáo khoa học: "Creating a CCGbank and a wide-coverage CCG lexicon for German"

We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94% of all known tokens in unseen text. | Creating a CCGbank and a wide-coverage CCG lexicon for German Julia Hockenmaier Institute for Research in Cognitive Science University of Pennsylvania Philadelphia PA 19104 USA juliahr@ Abstract We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46 628 derivations covering 95 of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94 of all known tokens in unseen text. 1 Introduction A number of wide-coverage TAG CCG LFG and HPSG grammars Xia 1999 Chen et al. 2005 Hockenmaier and Steedman 2002a O Donovan et al. 2005 Miyao et al. 2004 have been extracted from the Penn Treebank Marcus et al. 1993 and have enabled the creation of wide-coverage parsers for English which recover local and non-local dependencies that approximate the underlying predicate-argument structure Hocken-maier and Steedman 2002b Clark and Curran 2004 Miyao and Tsujii 2005 Shen and Joshi 2005 . However many corpora Bohomva et al. 2003 Skut et al. 1997 Brants et al. 2002 use dependency graphs or other representations and the extraction algorithms that have been developed for Penn Treebank style corpora may not be immediately applicable to this representation. As a consequence research on statistical parsing with deep grammars has largely been confined to English. Free-word order languages typically pose greater challenges for syntactic theories Rambow 1994 and the richer inflectional morphology of these languages creates additional problems both for the coverage of lexicalized formalisms such as CCG or TAG and for the usefulness of dependency counts extracted from the training data. On the other hand formalisms such as CCG and TAG are particularly suited to capture the cross ing dependencies that arise in languages such as Dutch or German and by choosing an appropriate linguistic representation some of these .

Ngọc Lâm 53 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Ebook Strategic management - Creating competitive advantages (7th edition): Part 1

312 68 0

Creating Web Pages All-in-One For Dummies,4th Edition

652 65 0

Ebook Creating and shaping texts (Ages 8-9)

64 81 0

Getting Started: Creating Applications with µVision

157 71 1

Introduction to Creo Parametric 3.0

776 62 0

Lecture fundamentals of marketing - Lecture 29: Creating competitive advantage

35 125 0

Employees’ acceptance of knowledge management systems and its impact on creating learning organizations

22 73 0

Lecture fundamentals of marketing - Lecture 29: Creating competitive advantage

35 46 1

Microsoft C# Professional Projects

957 52 0

LabVIEW Basics I Course Manual 6.0

388 75 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462302 61

Giới thiệu :Lập trình mã nguồn mở

14 24979 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9797 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7483 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7196 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Data Structures and Algorithms - Chapter 8: Heaps

41 173 5 30-11-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 151 3 30-11-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 149 1 30-11-2024

Quy Trình Canh Tác Cây Bông Vải

8 150 2 30-11-2024

Bảng màu theo chữ cái – V

11 155 2 30-11-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 228 7 30-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 160 2 30-11-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 171 2 30-11-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 160 1 30-11-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 142 1 30-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7483 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3797 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490