TAILIEUCHUNG - Báo cáo khoa học: "Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models"

We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing—the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer’s (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. . | Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert Jason Baldridge and Katrin Erk Department of Linguistics The University of Texas at Austin Austin TX 78712 ponvert jbaldrid @ Abstract We consider a new subproblem of unsupervised parsing from raw text unsupervised partial parsing the unsupervised version of text chunking. We show that addressing this task directly using probabilistic finite-state methods produces better results than relying on the local predictions of a current best unsupervised parser Seginer s 2007 CCL. These finite-state models are combined in a cascade to produce more general full-sentence constituent structures doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English German and Chinese. Finally we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries both in our system and in CCL. 1 Introduction Unsupervised grammar induction has been an active area of research in computational linguistics for over twenty years Lari and Young 1990 Pereira and Schabes 1992 Charniak 1993 . Recent work Headden III et al. 2009 Cohen and Smith 2009 Hanig 2010 Spitkovsky et al. 2010 has largely built on the dependency model with valence of Klein and Manning 2004 and is characterized by its reliance on gold-standard part-of-speech POS annotations the models are trained on and evaluated using sequences of POS tags rather than raw tokens. This is also true for models which are not successors of Klein and Manning Bod 2006 Hanig 2010 . An exception which learns from raw text and makes no use of POS tags is the common cover links parser CCL Seginer 2007 . CCL established state-of-the-art results for unsupervised constituency pars-1077 ing from raw text and it is also incremental and extremely fast for both learning and parsing. Unfortunately CCL is a non-probabilistic algorithm based on a complex set of inter-relating heuristics and a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.