# Lecture “optimization theory and applications for machine learning” started

## Lecture “optimization theory and applications for machine learning” started

The lecture “Optimization theory and applications for machine learning” for the graduate school has started. • Instructor : H.Kasai
• Abstract : Many problems in machine learning (ML) and statistics learning can be formulated in terms of optimization problem of some cost function, possibly under some constraints. In the era of big data, re-design of efficient and scalable algorithms to address optimization problems in large scale have attracted a surge of research interests from the viewpoint of theory as well as practice. This includes inherently high dimensional problems involving truly massive data sets. The main goal of this lecture is to expose students to a wide variety of modern algorithmic developments in optimization, especially in convex optimization, and bring them near the frontier of research in a large-scale optimization. Especially, non-linear programming is focused. Note that the some important topics including combinational optimization are not explored.
• Language : English (and Japanese in Q&A)
• Style : Real-time online lecture (recorded video will be provided after the class.)
• Else :
• All materials are provided via WASEDA Moodle
• Contact instructor via email (hiroyuki dot kasai at waseda dot jp)

#### Contents (tentative)

1. Introduction
1. Notations
2. Machine learning and optimization
3. First two tours of ML and optimization problem
1. Regression problem: Linear least squares
2. Classification problem: Logistic regression
4. Empirical risk minimization
1. Definition and interest
2. Examples
5. How to solve them ?
1. Closed-form approach
2. Iterative approach
6. What do we learn in this lecture ?
2. Convexity
1. Convex sets
1. Definitions
2. Examples of convex sets
3. Combinations and hulls
4. Preservation of convexity of sets
5. Practical approach to establish convexity of a set
6. Geometry of convex sets
2. Convex functions
1. Preliminaries
2. Definitions
3. Examples
4. Function characterization
5. Operations preserving convexity of functions
6. Practical approach to verify convexity of a function
3. Smoothness and strong convexity
1. Smoothness
2. Strong convexity
3. Smoothness and strong convexity
2. Subdifferential
5. Optimization preliminaries
1. Optimization problem
1. Definition and terminology
2. Problem types
2. Which optimization method is efficient ?
1. Black box oriented algorithm
2. What does “efficiency” mean ?
6. Unconstrained optimization I (gradient methods)
1. Optimality conditions
2. Optimality condition for convex function
3.  Iterative descent method
1. Fundamentals
2. Stepsize selection
1. Fundamentals
2. Sufficient decrease lemma
3. Convergence rate analysis for smooth case
4. Convergence rate analysis for smooth and convex case
5. Convergence rate analysis for smooth and strongly-convex case
5. Newton’s method
1. Fundamentals
2. Convergence analysis
7. Unconstrained optimization II (optimal methods)
1. Momentum (The heavy ball method)
1. Fundamentals
2. Convergence analysis
2. Nesterov’s accelerated gradient descent (AGD) method
1. Fundamentals
2. Convergence analysis
8. Constrained convex optimization I (without equality and inequality)
1. Optimality conditions
2. Conditional Gradient Descent (CGD) method (a.k.a. Frank–Wolfe method)
1. Fundamentals
2. Convergence analysis
3. Projected Gradient Descent (PGD) method
1. Orthogonal projection operator
2. Fundamentals
3.  Convergence analysis
4.  (Block) Coordinate descent (CD) methods
1. Questions for this approach
2. Coordinate descent for unconstrained problem
3. Block coordinate descent (BCD) for constrained problem
4. Variants of coordinate descent methods
5. Example
6. Convergence analysis
9. Constrained convex optimization II (with equality and inequality constraints)
1. Karush–Kuhn–Tucker (KKT) conditions
1. Lagrangian function
2. KKT conditions
3. Examples
2. Augmented Lagrangian methods
1. Equality constraint problem
2. General form
10. Constrained convex optimization III (proximal methods)
1. Proximal operator (mapping)
2. Moreau decomposition, Moreau Envelope
3. Proximal algorithm (Proximal minimization algorithm, Proximal point algorithm)
4. Proximal gradient method (PG, ISTA)
1. Fundamentals
2. Convergence analysis
5. Fast proximal gradient method (APG, FISTA)
1. Fundamentals
2. Convergence analysis
11. Duality
1. First look at duality
2. Lagrange duality in LP
3. Lagrange duality
1. Definitions and properties (week duality)
2. Strong duality
3. Connection to KKT condition in strong duality
4. Examples
4. Conjugate function
5. Fenchel duality
1. Fundamentals
2. Fenchel duality examples
6. Why dual ?
12. Constrained convex optimization IV (dual methods)
3. Dual decomposition
4. Dual proximal algorithm and augmented Lagrangian method
2. Connection to augmented Lagrangian method (method of multiplier)
13. Constrained convex optimization V (advanced methods)
1. Alternating direction method of multipliers (ADMM)
1. Bregman proximal descent (BPG)
2. Accelerated Bregman proximal descent (ABPG)
3. Primal-dual hybrid proximal method
14. Non-smooth optimization I (subgradient method)
1. Optimality conditions
1. Fundamentals
2. Convergence analysis
1.  Fundamentals
2. Convergence analysis
4. Mirror descent method
1. Bregman divergence
2. Mirror descent method
15. Non-smooth optimization II (smoothing methods)
1. Smoothable function
2. Moreau-Yoshida regularization
1. Definition
2. Properties
3. Nesterov’s smoothing
1. Fundamentals
2. Definition
3. Example: f(x)=|x|
4. Connection to the Moreau-Yoshida envelope
5. Algorithm
6. Convergence analysis
1. Fundamentals
2. Online stochastic optimization