TAILIEUCHUNG - Báo cáo khoa học: "Bayesian Learning of a Tree Substitution Grammar"

Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. | Bayesian Learning of a Tree Substitution Grammar Matt Post and Daniel Gildea Department of Computer Science University of Rochester Rochester NY 14627 Abstract Tree substitution grammars TSGs offer many advantages over context-free grammars CFGs but are hard to learn. Past approaches have resorted to heuristics. In this paper we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. 1 Introduction Tree substition grammars TSGs have potential advantages over regular context-free grammars CFGs but there is no obvious way to learn these grammars. In particular learning procedures are not able to take direct advantage of manually annotated corpora like the Penn Treebank which are not marked for derivations and thus assume a standard CFG. Since different TSG derivations can produce the same parse tree learning procedures must guess the derivations the number of which is exponential in the tree size. This compels heuristic methods of subtree extraction or maximum likelihood estimators which tend to extract large subtrees that overfit the training data. These problems are common in natural language processing tasks that search for a hidden segmentation. Recently many groups have had success using Gibbs sampling to address the complexity issue and nonparametric priors to address the overfitting problem DeNero et al. 2008 Goldwater et al. 2009 . In this paper we apply these techniques to learn a tree substitution grammar evaluate it on the Wall Street Journal parsing task and compare it to previous work. 2 Model Tree substitution grammars TSGs extend CFGs and their probabilistic counterparts which concern us here by allowing nonterminals to be rewritten as subtrees of arbitrary size. Although nonterminal rewrites are still context-free in practice TSGs can loosen the independence assumptions of CFGs because larger rules capture more .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.