TAILIEUCHUNG - Báo cáo khoa học: "Efficient Inference of CRFs for Large-Scale Natural Language Data"

This paper presents an efficient inference algorithm of conditional random fields (CRFs) for large-scale data. Our key idea is to decompose the output label state into an active set and an inactive set in which most unsupported transitions become a constant. Our method unifies two previous methods for efficient inference of CRFs, and also derives a simple but robust special case that performs faster than exact inference when the active sets are sufficiently small. We demonstrate that our method achieves dramatic speedup on six standard natural language processing problems. . | Efficient Inference of CRFs for Large-Scale Natural Language Data Minwoo Jeong Chin-Yew Lin- Gary Geunbae Lee tPohang University of Science Technology Pohang Korea Microsoft Research Asia Beijing China t stardust gblee @ lcyl@ Abstract This paper presents an efficient inference algorithm of conditional random fields CRFs for large-scale data. Our key idea is to decompose the output label state into an active set and an inactive set in which most unsupported transitions become a constant. Our method unifies two previous methods for efficient inference of CRFs and also derives a simple but robust special case that performs faster than exact inference when the active sets are sufficiently small. We demonstrate that our method achieves dramatic speedup on six standard natural language processing problems. 1 Introduction Conditional random fields CRFs are widely used in natural language processing but extending them to large-scale problems remains a significant challenge. For simple graphical structures . linear-chain an exact inference can be obtained efficiently if the number of output labels is not large. However for large number of output labels the inference is often prohibitively expensive. To alleviate this problem researchers have begun to study the methods of increasing inference speeds of CRFs. Pal et al. 2006 proposed a Sparse ForwardBackward SFB algorithm in which marginal distribution is compressed by approximating the true marginals using Kullback-Leibler KL divergence. Cohn 2006 proposed a Tied Potential TP algorithm which constrains the labeling considered in each feature function such that the functions can detect only a relatively small set of labels. Both of these techniques efficiently compute the marginals with a significantly reduced runtime resulting in faster training and decoding of CRFs. This paper presents an efficient inference algorithm of CRFs which unifies the SFB and TP approaches. We first decompose output .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.