TAILIEUCHUNG - Báo cáo khoa học: "The Columbia Arabic Treebank"

The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on speed with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach: no annotation of redundant information and using representations and terminology inspired by traditional Arabic syntax. We describe CATiB’s representation and annotation procedure, and report on interannotator agreement and speed. . | CATiB The Columbia Arabic Treebank Nizar Habash and Ryan M. Roth Center for Computational Learning Systems Columbia University New York USA habash ryanr @ Abstract The Columbia Arabic Treebank CATiB is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on speed with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach no annotation of redundant information and using representations and terminology inspired by traditional Arabic syntax. We describe CATiB s representation and annotation procedure and report on interannotator agreement and speed. 1 Introduction and Motivation Treebanks are collections of manually-annotated syntactic analyses of sentences. They are primarily intended for building models for statistical parsing however they are often enriched for general natural language processing purposes. For Arabic two important treebanking efforts exist the Penn Arabic Treebank PATB Maamouri et al. 2004 and the Prague Arabic Dependency Treebank PADT Smrz and Hajic 2006 . In addition to syntactic annotations both resources are annotated with rich morphological and semantic information such as full part-of-speech POS tags lemmas semantic roles and diacritizations. This allows these treebanks to be used for training a variety of applications other than parsing such as tokenization diacritization POS tagging morphological disambiguation base phrase chunking and semantic role labeling. In this paper we describe a new Arabic treebanking effort the Columbia Arabic Treebank CATiB .1 CATiB is motivated by the following three observations. First as far as parsing Arabic research much of the non-syntactic rich annotations are not used. For example PATB has over 400 tags but they are typically reduced to around 36 tags in training and testing parsers Kulick et 1This work was supported by Defense Advanced Research Projects Agency Contract No. .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.