TAILIEUCHUNG - Báo cáo khoa học: "INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA"

The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information (constituent bracketing) in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modeling of hierarchical structure than the original one. In particular, over 90% test set bracketing accuracy was achieved for grammars inferred by our algorithm from a training set of handparsed part-of-speech strings for sentences in the Air Travel Information System spoken language corpus. Finally, the new algorithm has better time complexity than. | INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA Fernando Pereira 2D-447 AT T Bell Laboratories PO Box 636 600 Mountain Ave Murray Hill NJ 07974-0636 Yves Schabes Dept of Computer and Information Science University of Pennsylvania Philadelphia PA 19104-6389 ABSTRACT The inside-outside algorithm for inferring the parameters of a stochastic context-free grammar is extended to take advantage of constituent information constituent bracketing in a partially parsed corpus. Experiments on formal and natural language parsed corpora show that the new algorithm can achieve faster convergence and better modeling of hierarchical structure than the original one. In particular over 90 test set bracketing accuracy was achieved for grammars inferred by our algorithm from a training set of hand-parsed part-of-speech strings for sentences in the Air Travel Information System spoken language corpus. Finally the new algorithm has better time complexity than the original one when sufficient bracketing is provided. 1. MOTIVATION The most successful stochastic language models have been based on finite-state descriptions such as n-grams or hidden Markov models HMMs Jelinek et al. 1992 . However finite-state models cannot represent the hierarchical structure of natural language and are thus ill-suited to tasks in which that structure is essential such as language understanding or translation. It is then natural to consider stochastic versions of more powerful grammar formalisms and their grammatical inference problems. For instance Baker 1979 generalized the parameter estimation methods for HMMs to stochastic context-free grammars SCFGs Booth 1969 as the inside-outside algorithm. Unfortunately the application of SCFGs and the original inside-outside algorithm to natural-language modeling has been so far inconclusive Lari and Young 1990 Jelinek et al. 1990 Lari and Young 1991 . Several reasons can be adduced for the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.