TAILIEUCHUNG - Báo cáo khoa học: "Iterative Scaling and Coordinate Descent Methods for Maximum Entropy"

Maximum entropy (Maxent) is useful in many areas. Iterative scaling (IS) methods are one of the most popular approaches to solve Maxent. With many variants of IS methods, it is difficult to understand them and see the differences. In this paper, we create a general and unified framework for IS methods. This framework also connects IS and coordinate descent (CD) methods. Besides, we develop a CD method for Maxent. Results show that it is faster than existing iterative scaling methods1 . | Iterative Scaling and Coordinate Descent Methods for Maximum Entropy Fang-Lan Huang Cho-Jui Hsieh Kai-Wei Chang and Chih-Jen Lin Department of Computer Science National Taiwan University Taipei 106 Taiwan d93011 b92085 b92084 cjlin @ Abstract Maximum entropy Maxent is useful in many areas. Iterative scaling IS methods are one of the most popular approaches to solve Maxent. With many variants of IS methods it is difficult to understand them and see the differences. In this paper we create a general and unified framework for IS methods. This framework also connects IS and coordinate descent CD methods. Besides we develop a CD method for Maxent. Results show that it is faster than existing iterative scaling methods1. 1 Introduction Maximum entropy Maxent is widely used in many areas such as natural language processing NLP and document classification. Maxent models the conditional probability as Pw y x Sw x y Tw x 1 Sw x y eP wtft x y Tw x 52y Sw x y where x indicates a context y is the label of the context and w G Rn is the weight vector. A function ft x y denotes the t-th feature extracted from the context x and the label y . Given an empirical probability distribution P x y obtained from training samples Maxent minimizes the following negative log-likelihood minw -Ex y p x y logpw y x 2 Exp x logTw x -Et wtp ft where p x Ey p x y is the marginal probability of x and p ft 52xy p x y ft x y is the expected value of ft x y . To avoid overfitting the training samples some add a regularization term and solve 2 min L w E P x logTw x -52 Wtpft x 3 1A complete version of this work is at http cjlin papers . where Ơ is a regularization parameter. We focus on 3 instead of 2 because 3 is strictly convex. Iterative scaling IS methods are popular in training Maxent models. They all share the same property of solving a one-variable sub-problem at a time. Existing IS methods include generalized iterative scaling GIS by Darroch .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.