TAILIEUCHUNG - Báo cáo khoa học: "Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars"

We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. | Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars Zhenghua Li Ting Liu Wanxiang Che Research Center for Social Computing and Information Retrieval School of Computer Science and Technology Harbin Institute of Technology China lzh tliu car @ Abstract We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns TP are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs we design quasi-synchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks Penn Chinese Treebank and using the Chinese Dependency Treebank as the source treebank. The improvements are respectively and with automatic part-of-speech tags. Moreover an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion. 1 Introduction The scale of available labeled data significantly affects the performance of statistical data-driven models. As a structural classification problem that is more challenging than binary classification and sequence labeling problems syntactic parsing is more prone to suffer from the data sparseness problem. However the heavy cost of treebanking typically limits one single treebank in both scale and genre. At present learning from one single treebank seems inadequate for further boosting parsing Correspondence author tliu@ incorporating an increased number of global features such as third-order features in graph-based parsers slightly affects parsing accuracy Koo and Collins 2010 Li et al. 2011 . Treebanks of Words Grammar CTB5 million Phrase structure CTB6 million Phrase structure CDT million Dependency structure Sinica million Phrase structure TCT

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.