Lecture “optimization theory and applications for machine learning” started

Lecture “optimization theory and applications for machine learning” started

The lecture “Optimization theory and applications for machine learning” for the graduate school has started. 

  • Instructor : H.Kasai
  • Abstract : Many problems in machine learning (ML) and statistics learning can be formulated in terms of optimization problem of some cost function, possibly under some constraints. In the era of big data, re-design of efficient and scalable algorithms to address optimization problems in large scale have attracted a surge of research interests from the viewpoint of theory as well as practice. This includes inherently high dimensional problems involving truly massive data sets. The main goal of this lecture is to expose students to a wide variety of modern algorithmic developments in optimization, especially in convex optimization, and bring them near the frontier of research in a large-scale optimization. Especially, non-linear programming is focused. Note that the some important topics including combinational optimization are not explored.
  • Language : English (and Japanese in Q&A)
  • Style : Real-time online lecture (recorded video will be provided after the class.)
  • Else :
    • All materials are provided via WASEDA Moodle
    • Contact instructor via email (hiroyuki dot kasai at waseda dot jp)

Contents (tentative)

  1. Introduction 
    1. Notations 
    2. Machine learning and optimization 
    3. First two tours of ML and optimization problem 
      1. Regression problem: Linear least squares 
      2. Classification problem: Logistic regression 
    4. Empirical risk minimization 
      1. Definition and interest 
      2. Examples 
    5. How to solve them ? 
      1. Closed-form approach 
      2. Iterative approach 
    6. What do we learn in this lecture ? 
  2. Convexity 
    1. Convex sets 
      1. Definitions 
      2. Examples of convex sets 
      3. Combinations and hulls 
      4. Preservation of convexity of sets 
      5. Practical approach to establish convexity of a set 
      6. Geometry of convex sets 
    2. Convex functions 
      1. Preliminaries 
      2. Definitions 
      3. Examples 
      4. Function characterization 
      5. Operations preserving convexity of functions 
      6. Practical approach to verify convexity of a function 
  3. Smoothness and strong convexity 
    1. Smoothness 
    2. Strong convexity 
    3. Smoothness and strong convexity 
  4. Subgradient and subdifferential 
    1. Subgradient 
    2. Subdifferential 
  5. Optimization preliminaries 
    1. Optimization problem 
      1. Definition and terminology 
      2. Problem types 
    2. Which optimization method is efficient ? 
      1. Black box oriented algorithm 
      2. What does “efficiency” mean ? 
  6. Unconstrained optimization I (gradient methods) 
    1. Optimality conditions 
    2. Optimality condition for convex function 
    3.  Iterative descent method 
      1. Fundamentals 
      2. Stepsize selection 
    4. Gradient Descent (GD) method 
      1. Fundamentals 
      2. Sufficient decrease lemma 
      3. Convergence rate analysis for smooth case 
      4. Convergence rate analysis for smooth and convex case 
      5. Convergence rate analysis for smooth and strongly-convex case 
    5. Newton’s method 
      1. Fundamentals 
      2. Convergence analysis 
  7. Unconstrained optimization II (optimal methods) 
    1. Momentum (The heavy ball method) 
      1. Fundamentals 
      2. Convergence analysis 
    2. Nesterov’s accelerated gradient descent (AGD) method 
      1. Fundamentals 
      2. Convergence analysis 
  8. Constrained convex optimization I (without equality and inequality) 
    1. Optimality conditions 
    2. Conditional Gradient Descent (CGD) method (a.k.a. Frank–Wolfe method) 
      1. Fundamentals 
      2. Convergence analysis 
    3. Projected Gradient Descent (PGD) method 
      1. Orthogonal projection operator 
      2. Fundamentals 
      3.  Convergence analysis 
    4.  (Block) Coordinate descent (CD) methods 
      1. Questions for this approach 
      2. Coordinate descent for unconstrained problem 
      3. Block coordinate descent (BCD) for constrained problem 
      4. Variants of coordinate descent methods 
      5. Example 
      6. Convergence analysis 
  9. Constrained convex optimization II (with equality and inequality constraints) 
    1. Karush–Kuhn–Tucker (KKT) conditions 
      1. Lagrangian function 
      2. KKT conditions 
      3. Examples 
    2. Augmented Lagrangian methods 
      1. Equality constraint problem 
      2. General form 
  10. Constrained convex optimization III (proximal methods) 
    1. Proximal operator (mapping) 
    2. Moreau decomposition, Moreau Envelope 
    3. Proximal algorithm (Proximal minimization algorithm, Proximal point algorithm) 
    4. Proximal gradient method (PG, ISTA)
      1. Fundamentals 
      2. Convergence analysis 
    5. Fast proximal gradient method (APG, FISTA) 
      1. Fundamentals 
      2. Convergence analysis 
  11. Duality 
    1. First look at duality 
    2. Lagrange duality in LP 
    3. Lagrange duality 
      1. Definitions and properties (week duality) 
      2. Strong duality 
      3. Connection to KKT condition in strong duality 
      4. Examples 
    4. Conjugate function 
    5. Fenchel duality 
      1. Fundamentals 
      2. Fenchel duality examples 
    6. Why dual ? 
  12. Constrained convex optimization IV (dual methods) 
    1. Dual projected subgradient method 
    2. Dual subgradient method and dual gradient ascent with affine constraint 
    3. Dual decomposition 
    4. Dual proximal algorithm and augmented Lagrangian method 
      1. Dual proximal gradient 
      2. Connection to augmented Lagrangian method (method of multiplier) 
  13. Constrained convex optimization V (advanced methods) 
    1. Alternating direction method of multipliers (ADMM) 
    2. Generalized proximal gradient method 
      1. Bregman proximal descent (BPG) 
      2. Accelerated Bregman proximal descent (ABPG) 
    3. Primal-dual hybrid proximal method 
  14. Non-smooth optimization I (subgradient method) 
    1. Optimality conditions 
    2. Subgradient method 
      1. Fundamentals 
      2. Convergence analysis 
    3. Projected subgradient method 
      1.  Fundamentals 
      2. Convergence analysis 
    4. Mirror descent method 
      1. Bregman divergence 
      2. Mirror descent method 
  15. Non-smooth optimization II (smoothing methods) 
    1. Smoothable function 
    2. Moreau-Yoshida regularization 
      1. Definition 
      2. Properties 
    3. Nesterov’s smoothing 
      1. Fundamentals 
      2. Definition 
      3. Example: f(x)=|x|
      4. Connection to the Moreau-Yoshida envelope 
      5. Algorithm 
      6. Convergence analysis
  16. Stochastic gradient
    1. Fundamentals
    2. Online stochastic optimization
    3. Stochastic gradient descent (SGD)   
    4. Accelerated methods of SGD   
      1. SAG (Stochastic average gradient), SAGA   
      2. SVRG (Stochastic variance reduced gradient)   
    5. Adaptive stochastic gradient (or adaptive learning rate)   
  17. Appendix : Mathematical preliminaries 
    1. Norms 
    2. Basic topology 
    3. Classification of matrices 
    4. Eigenvalues and eigenvectors 
    5. Singular value decomposition (SVD) 
    6. Dual space and dual norm