TAILIEUCHUNG - Báo cáo khoa học: "Semi-supervised Dependency Parsing using Lexical Affinities"

Treebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities, on large corpora, for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of Labeled Accuracy Score yielding the best parsing results on this treebank | Semi-supervised Dependency Parsing using Lexical Affinities Seyed Abolghasem Mirroshandel Alexis Nasr Joseph Le Roux 1 .aboratoire d Informatique Fondamentale de Marseille- CNRS - UMR 7279 Universite Aix-Marseille Marseille France 1 .H N Universite Paris Nord CNRS Villetaneuse France Computer Engineering Department Sharif university of Technology Tehran Iran leroux@ Abstract Treebanks are not large enough to reliably model precise lexical phenomena. This deficiency provokes attachment errors in the parsers trained on such data. We propose in this paper to compute lexical affinities on large corpora for specific lexico-syntactic configurations that are hard to disambiguate and introduce the new information in a parser. Experiments on the French Treebank showed a relative decrease of the error rate of Labeled Accuracy Score yielding the best parsing results on this treebank. 1 Introduction Probabilistic parsers are usually trained on treebanks composed of few thousands sentences. While this amount of data seems reasonable for learning syntactic phenomena and to some extent very frequent lexical phenomena involving closed parts of speech POS it proves inadequate when modeling lexical dependencies between open POS such as nouns verbs and adjectives. This fact was first recognized by Bikel 2004 who showed that bilexical dependencies were barely used in Michael Collins parser. The work reported in this paper aims at a better modeling of such phenomena by using a raw corpus that is several orders of magnitude larger than the treebank used for training the parser. The raw corpus is first parsed and the computed lexical affinities between lemmas in specific lexico-syntactic configurations are then injected back in the parser. Two outcomes are expected from this procedure the first 777 is as mentioned above a better modeling of bilexi-cal dependencies and the second is a method to adapt a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.