TAILIEUCHUNG - Báo cáo khoa học: "A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction"

In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. | A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction Phil Blunsom Department of Computer Science University of Oxford Trevor Cohn Department of Computer Science University of Sheffield Abstract In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages. 1 Introduction Unsupervised part-of-speech PoS induction has long been a central challenge in computational linguistics with applications in human language learning and for developing portable language processing systems. Despite considerable research effort progress in fully unsupervised PoS induction has been slow and modern systems barely improve over the early Brown et al. 1992 approach Christodoulopoulos et al. 2010 . One popular means of improving tagging performance is to include supervision in the form of a tag dictionary or similar however this limits portability and also comprimises any cognitive conclusions. In this paper we present a novel approach to fully unsupervised PoS induction which uniformly outperforms the existing state-of-the-art across all our corpora in 10 different languages. Moreover the performance of our unsupervised model approaches 865 that of many existing semi-supervised systems despite our method not receiving any human input. In this paper we present a Bayesian hidden Markov model HMM which uses a non-parametric prior to infer a latent tagging for a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.