Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty"

Khuê Trúc 91 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efﬁciently applied in SGD training, due to the large dimensions of feature vectors and the ﬂuctuations of approximate gradients. . | Stochastic Gradient Descent Training for Ll-regularized Log-linear Models with Cumulative Penalty Yoshimasa Tsuruoka1 Jun ichi Tsujiitt Sophia Ananiadou1 1 School of Computer Science University of Manchester UK National Centre for Text Mining NaCTeM UK Department of Computer Science University of Tokyo Japan yoshimasa.tsuruoka j.tsujii sophia.ananiadou @manchester.ac.uk Abstract Stochastic gradient descent SGD uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However L1-regularization which is becoming popular in natural language processing because of its ability to produce compact models cannot be efficiently applied in SGD training due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications text chunking named entity recognition and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasiNewton method for L1-regularized log-linear models. l Introduction Log-linear models a.k.a maximum entropy models are one of the most widely-used probabilistic models in the field of natural language processing NLP . The applications range from simple classification tasks such as text classification and history-based tagging Ratnaparkhi 1996 to more complex structured prediction tasks such as part-of-speech POS tagging Lafferty et al. 2001 syntactic parsing Clark and Curran 2004 and semantic role labeling Toutanova et al. 2005 . Log-linear models have a major advantage over other discriminative machine learning models such as support vector machines their .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Stochastic Methods of Mechanical Translation"

Báo cáo khoa học: "Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT"

Báo cáo khoa học: "Reversible Stochastic Attribute-Value Grammars"

Báo cáo khoa học: "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty"

Báo cáo khoa học: "A Stochastic Finite-State Morphological Parser for Turkish"

Báo cáo khoa học: "Stochastic Language Generation Using WIDL-expressions and its Application in Machine Translation and Summarization"

Báo cáo khoa học: "The Beneﬁt of Stochastic PP Attachment to a Rule-Based Parser"

Báo cáo khoa học: "Bootstrapping a Stochastic Transducer for Arabic-English Transliteration Extraction"

Báo cáo khoa học: "Stochastic Iterative Alignment for Machine Translation Evaluation"

Báo cáo khoa học: "Stochastic Discourse Modeling in Spoken Dialogue Systems Using Semantic Dependency Graphs"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.