TAILIEUCHUNG - Báo cáo khoa học: "An All-Subtrees Approach to Unsupervised Parsing"

We investigate generalizations of the allsubtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator which is known to be statistically consistent. We report state-ofthe-art results on English (WSJ), German (NEGRA) and Chinese (CTB) data. . | An All-Subtrees Approach to Unsupervised Parsing Rens Bod School of Computer Science University of St Andrews North Haugh St Andrews KY16 9sX Scotland UK rb@ Abstract We investigate generalizations of the allsubtrees DOP approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use a large random subset of all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator which is known to be statistically consistent. We report state-of-the-art results on English WSJ German NEGRA and Chinese CTB data. To the best of our knowledge this is the first paper which tests a maximum likelihood estimator for DOP on the Wall Street Journal leading to the surprising result that an unsupervised parsing model beats a widely used supervised model a treebank PCFG . 1 Introduction The problem of bootstrapping syntactic structure from unlabeled data has regained considerable interest. While supervised parsers suffer from shortage of hand-annotated data unsupervised parsers operate with unlabeled raw data of which unlimited quantities are available. During the last few years there has been steady progress in the field. Where van Zaanen 2000 achieved unlabeled f-score on ATIS word strings Clark 2001 reports on the same data and Klein and Manning 2002 obtain f-score on ATIS part-of-speech strings using a constituent-context model called CCM. On Penn Wall Street Journal p-o-s-strings 10 WSJ10 Klein and Manning 2002 report unlabeled f-score with CCM. And the hybrid approach of Klein and Manning 2004 which combines constituency and dependency models yields f-score. Bod 2006 shows that a further improvement on the WSJ10 can be achieved by an unsupervised generalization of the all-subtrees approach known as Data-Oriented Parsing DOP . This unsupervised DOP model coined .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.