TAILIEUCHUNG - Báo cáo khoa học: "Large-Scale Syntactic Language Modeling with Treelets"

We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. | Large-Scale Syntactic Language Modeling with Treelets Adam Pauls Dan Klein Computer Science Division University of California Berkeley Berkeley CA 94720 USA adpauls klein @ Abstract We propose a simple generative syntactic language model that conditions on overlapping windows of tree context or treelets in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours. We evaluate on perplexity and a range of grammaticality tasks and find that we perform as well or better than n-gram models and other generative baselines. Our model even competes with state-of-the-art discriminative models hand-designed for the grammaticality tasks despite training on positive data alone. We also show fluency improvements in a preliminary machine translation experiment. 1 Introduction N-gram language models are a central component of all speech recognition and machine translation systems and a great deal of research centers around refining models Chen and Goodman 1998 efficient storage Pauls and Klein 2011 Heafield 2011 and integration into decoders Koehn 2004 Chiang 2005 . At the same time because n-gram language models only condition on a local window of linear word-level context they are poor models of long-range syntactic dependencies. Although several lines of work have proposed generative syntactic language models that improve on n-gram models for moderate amounts of data Chelba 1997 Xu et al. 2002 Charniak 2001 Hall 2004 Roark 959 2004 these models have only recently been scaled to the impressive amounts of data routinely used by n-gram language models Tan et al. 2011 . In this paper we describe a generative syntactic language model that conditions on local context treelets1 in

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.