TAILIEUCHUNG - Báo cáo khoa học: "System for Querying Syntactically Annotated Corpora"

This paper presents a system for querying treebanks. The system consists of a powerful query language with natural support for cross-layer queries, a client interface with a graphical query builder and visualizer of the results, a command-line client interface, and two substitutable query engines: a very efficient engine using a relational database (suitable for large static data), and a slower, but paralel-computing enabled, engine operating on treebank files (suitable for “live” data). . | System for Querying Syntactically Annotated Corpora Petr Pajas Charles Univ. in Prague MFF UFAL Malostranske nám. 25 118 00 Prague 1 - Czech Rep. pajas@ Jan Stepanek Charles Univ. in Prague MFF UFAL Malostranske nám. 25 118 00 Prague 1 - Czech Rep. stepanek@ Abstract This paper presents a system for querying treebanks. The system consists of a powerful query language with natural support for cross-layer queries a client interface with a graphical query builder and visual-izer of the results a command-line client interface and two substitutable query engines a very efficient engine using a relational database suitable for large static data and a slower but paralel-computing enabled engine operating on treebank files suitable for live data . 1 Introduction Syntactically annotated treebanks are a great resource of linguistic information that is available hardly or not at all in flat text corpora. Retrieving this information requires specialized tools. Some of the best-known tools for querying treebanks include TigerSEARCH Lezius 2002 TGrep2 Rohde 2001 MonaSearch Maryns and Kepser 2009 and NetGraph Mirovsky 2006 . All these tools dispose of great power when querying a single annotation layer with nodes labeled by flat feature records. However most of the existing systems are little equipped for applications on structurally complex treebanks involving for example multiple interconnected annotation layers multi-lingual parallel annotations with node-to-node alignments or annotations where nodes are labeled by attributes with complex values such as lists or nested attribute-value structures. The Prague Dependency Treebank Hajic and others 2006 PDT for short is a good example of a treebank with multiple annotation layers and richly-structured attribute values. NetGraph was a tool traditionally used for querying over PDT but still it does not directly support cross-layer queries unless the layers are merged together at the cost of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.