TAILIEUCHUNG - Báo cáo khoa học: "What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?"

We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important, leading to improved parse accuracy over previous models tested on this treebank (a precis ion of and a recall of ). We isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy. | What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy Rens Bod School of Computing University of Leeds Leeds LS2 9JT Institute for Logic Language and Computation University of Amsterdam Spuistraat 134 1012 VB Amsterdam rens@ Abstract We aim at finding the minimal set of fragments which achieves maximal parse accuracy in Data Oriented Parsing. Experiments with the Penn Wall Street Journal treebank show that counts of almost arbitrary fragments within parse trees are important leading to improved parse accuracy over previous models tested on this treebank a precis -ion of and a recall of . We isolate some dependency relations which previous models neglect but which contribute to higher parse accuracy. 1 Introduction One of the goals in statistical natural language parsing is to find the minimal set of statistical dependencies between words and syntactic structures that achieves maximal parse accuracy. Many stochastic parsing models use linguistic intuitions to find this minimal set for example by restricting the statistical dependencies to the locality of headwords of constituents Collins 1997 1999 Eisner 1997 leaving it as an open question whether there exist important statistical dependencies that go beyond linguistically motivated dependencies. The Data Oriented Parsing DOP model on the other hand takes a rather extreme view on this issue given an annotated corpus all fragments . subtrees seen in that corpus regardless of size and lexicalization are in principle taken to form a grammar see Bod 1993 1998 Goodman 1998 Sima an 1999 . The set of subtrees that is used is thus very large and extremely redundant. Both from a theoretical and from a computational perspective we may wonder whether it is possible to impose constraints on the subtrees that are used in such a way that the accuracy of the model does not deteriorate or perhaps even improves. That is the main question addressed in this paper. We report on .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.