TAILIEUCHUNG - Báo cáo khoa học: "Is the End of Supervised Parsing in Sight?"

How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees. We train both on Penn’s WSJ data and on the (much larger) NANC corpus, showing that U-DOP* outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP* performs worse than state-of-the-art supervised parsers on handannotated sentences, we show that the model outperforms supervised parsers when evaluated as a language model. | Is the End of Supervised Parsing in Sight Rens Bod School of Computer Science University of St Andrews ILLC University of Amsterdam rb@ Abstract How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted We present a new algorithm for unsupervised parsing using an all-subtrees model termed U-DOP which parses directly with packed forests of all binary trees. We train both on Penn s WSJ data and on the much larger NANC corpus showing that U-DOP outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP performs worse than state-of-the-art supervised parsers on hand-annotated sentences we show that the model outperforms supervised parsers when evaluated as a language model in syntax-based machine translation on Europarl. We argue that supervised parsers miss the fluidity between constituents and non-constituents and that in the field of syntax-based language modeling the end of supervised parsing has come in sight. 1 Introduction A major challenge in natural language parsing is the unsupervised induction of syntactic structure. While most parsing methods are currently supervised or semi-supervised McClosky et al. 2006 Henderson 2004 Steedman et al. 2003 they depend on hand-annotated data which are difficult to come by and which exist only for a few languages. Unsupervised parsing methods are becoming increasingly important since they operate with raw unlabeled data of which unlimited quantities are available. There has been a resurgence of interest in unsupervised parsing during the last few years. Where van Zaanen 2000 and Clark 2001 induced unlabeled phrase structure for small domains like the ATIS obtaining around 40 unlabeled f-score Klein and Manning 2002 report f-score on Penn WSJ part-of-speech strings 10 words WSJ10 using a constituentcontext model called CCM. Klein and Manning 2004 further show that a hybrid approach which combines constituency .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.