TAILIEUCHUNG - Báo cáo khoa học: "Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation"

We present a novel extension to a recently proposed incremental learning algorithm for the word segmentation problem originally introduced in Goldwater (2006). By adding rejuvenation to a particle filter, we are able to considerably improve its performance, both in terms of finding higher probability and higher accuracy solutions. | Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation Benjamin Borschinger Mark Johnson Department of Computing Macquarie University Sydney Australia Department of Computational Linguistics Heidelberg University Heidelberg Germany Abstract We present a novel extension to a recently proposed incremental learning algorithm for the word segmentation problem originally introduced in Goldwater 2006 . By adding rejuvenation to a particle filter we are able to considerably improve its performance both in terms of finding higher probability and higher accuracy solutions. 1 Introduction The goal of word segmentation is to segment a stream of segments . characters or phonemes into words. For example given the sequence youwanttoseethebook the goal is to recover the segmented string you want to see the book . The models introduced in Goldwater 2006 solve this problem in a fully unsupervised way by defining a generative process for word sequences making use of the Dirichlet Process DP prior. Until recently the only inference algorithm applied to these models were batch Markov Chain Monte Carlo MCMC sampling algorithms. Borschinger and Johnson 2011 proposed a strictly incremental particle filter algorithm that however performed considerably worse than the standard batch algorithms in particular for the Bigram model. We extend that algorithm by adding rejuvenation steps and show that this leads to considerable improvements thus strengthening the case for particle filters as another tool for Bayesian inference in computational linguistics. The rest of the paper is structured as follows. Sections 2 and 3 provide the relevant background about 85 word segmentation and previous work. Section 4 describes our algorithm. Section 5 reports on an experimental evaluation of our algorithm and section 6 concludes and suggests possible directions for future research. 2 Model description The Unigram model assumes

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.