Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "The Treegram Index An Efficient Technique for Retrieval in Linguistic Treebanks"

Minh Thảo 83 2 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

With the availability of large treebanks, retrieval techniques for highly structured data now become essential. In this contribution, we investigate the efficient retrieval of MT structures at the cost of a complex index--the Treegram Index. We illustrate our approach with the VENONA retrieval system, which handles the BH t (Biblia Hebraica transeripta) treebank comprising 508,650 phrase structure trees with maximum degree eight and maximum height 17, containing altogether 3.3 million Old-Hebrew words. 1 Multiway-tree retrieval based on treegrams To cope with this tree-retrieval problem, we generalize the well-known n-gram indexing technique for text databases: In place of substrings. | Proceedings of EACL 99 The Treegram Index An Efficient Technique for Retrieval in Linguistic Treebanks Hans Argenton and Anke Feldhaus Infineon Technologies DAT CIF Postbox 801709 D-81617 Miinchen hans. ar genton@infineon. com University of Tubingen SfS Kleine Wilhelmstr.113 D-72074 Tubingen feldhaus@sfs.nphil.uni-tuebingen.de Multiway trees MT henceforth are a common and well-understood data structure for describing hierarchical linguistic information. With the availability of large treebanks retrieval techniques for highly structured data now become essential. In this contribution we investigate the efficient retrieval of MT structures at the cost of a complex index the Treegram Index. We illustrate our approach with the Venona retrieval system which handles the BH Biblia Hebraica transcripta treebank comprising 508 650 phrase structure trees with maximum degree eight and maximum height 17 containing altogether 3.3 million Old-Hebrew words. 1 Multiway-tree retrieval based on treegrams The base entities of the tree-retrieval problem for positional MTs are labeled rooted MTs where children are distinguished by their position. Let s and t be two MTs t contains s written as 5 i if there exists an injective embedding such that 1 nodes are mapped to nodes with identical labels and 2 a root of a child with position i is mapped to a root of a child with the same position. Retrieval problem Let DB be a set of labeled positional MTs and let q be a query tree having the same label alphabet. The problem is to find efficiently all trees t G DB that contain q. To cope with this tree-retrieval problem we generalize the well-known n-gram indexing technique for text databases In place of substrings with fixed length we use subtrees with fixed maximal height treegrams. Let TG t h denote the set of all treegrams of height h contained in the MT t and let T DB denote the set of all database trees that contain the treegram g. Assume that g has the height h and that T DB can be .

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.