TAILIEUCHUNG - Báo cáo khoa học: "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty"

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. . | Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty Yoshimasa Tsuruoka1 Jun ichi Tsujiitt Sophia Ananiadou1 1 School of Computer Science University of Manchester UK National Centre for Text Mining NaCTeM UK Department of Computer Science University of Tokyo Japan @ Abstract Stochastic gradient descent SGD uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However L1-regularization which is becoming popular in natural language processing because of its ability to produce compact models cannot be efficiently applied in SGD training due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications text chunking named entity recognition and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasiNewton method for L1-regularized log-linear models. l Introduction Log-linear models maximum entropy models are one of the most widely-used probabilistic models in the field of natural language processing NLP . The applications range from simple classification tasks such as text classification and history-based tagging Ratnaparkhi 1996 to more complex structured prediction tasks such as part-of-speech POS tagging Lafferty et al. 2001 syntactic parsing Clark and Curran 2004 and semantic role labeling Toutanova et al. 2005 . Log-linear models have a major advantage over other discriminative machine learning models such as support vector machines their .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.