Fields Academy Shared Graduate Course: Mathematical Introduction to Machine Learning
Description
Instructor: Professor Maia Fraser, University of Ottawa
Course Description
This course will introduce mathematics graduate students to Machine Learning, giving an overview of methods and their properties from a mathematical point of view. The course assignments and project will involve adjustable amounts of theoretical and practical work. No previous experience in machine learning or coding is required, just graduate-level mathematical maturity, curiosity about machine learning, and the willingness to acquire some coding experience as we go, though the amount of this can be adjusted.
We will start with a brief look at the formalization of learning. We will then see this formulation in action through a variety of linear techniques: first, we revisit the familiar Ordinary Least Squares algorithm and some of its variants from this perspective, then Discriminant Analysis; finally, we cover Support Vector Machines (SVMs) and their underlying mathematics. We then look at reproducing kernel Hilbert spaces (RKHS) and the so-called kernel trick which gives rise to a wide-range of non-linear learning algorithms known as kernel methods; these are “kernelized” versions of linear methods. The core of the course then focuses on learning theory: VC Theory and PAC learning. We then consider more elaborate learning methods, such as boosting, from this perspective, and we also study neural networks (Perceptron, CNNs) from a mathematical point of view. Finally, depending on time and student interest, we will cover additional topics chosen from: Reinforcement Learning, Manifold Learning, Algorithmic Stability...
Text: Foundations of Machine Learning by Mohri et al (2012/2018) https://cs.nyu.edu/~mohri/mlbook/, plus handouts of mine and links to various online resources.
Course work: There will be 3 homework assignments, one in-person midterm, and a final project that is done in small teams over the space of 4 weeks. On homeworks and project there will be some flexibility in the amount of theory vs. implementation that you do, as well as the types of data you choose to work with. The hope is to tailor this course to your particular interests and background. No instruction in coding will be given but there is ample time to learn as you go and the demands will grow gradually and remain moderate. The midterm will include some basic questions to test familiarity with the material, as well as one or two proof-based questions. Submission of homework and project will be via the Fields Moodle.
Grading scheme: HW1 and HW2 will be graded and each count for 14% of your grade, HW3 will be pass-fail and count for 2% of your grade, the midterm will count for 30% and the final project (done in small groups) for 40% - it will serve as a take-home exam. Note: if we have ample TA support this scheme could be modified so each homework counts for 10% of the grade (making the same total of 30% for all homework).
Example of approximate schedule:
- Week #1: Syllabus; mathematical perspective on ML and its history; three ML paradigms - supervised, unsupervised, and semi- supervised learning (SSL) - plus reinforcement learning (RL)... we'll focus mainly on supervised learning. Formalization of learning, including ERM. Ordinary Least Squares (OLS) as ERM (using my handout).
- Week #2: OLS and RegLS and RecursiveLS - mathematical properties (see handout).
- Week #3: Various loss functions. Linear Discriminant Analysis.
- Week #4: Support Vector Machines (first 3 sections of Mohri et al's Chap 5, also my handout on SVMs).
- Week #5: Finish discussion of SVM’s using KKT theorem.
- THE MIDTERM COVERS MATERIAL UP TO HERE
- Week #6: In-class midterm. Definition of PAC learnability + proof that the concept class of axis-aligned rectangles is PAC learnable (Ex 2.4 in Mohri et al 2nd edition, also my handout).
- READING WEEK
- Week #7: Mohri et al's Chapter 2 giving learning bounds for the performance of algorithms with finite hypothesis set H, first for consistent algorithms (which only exist when C = H, i.e. realizability, holds) then for general.
- Week #8: Finish Chap 2 and start Rademacher complexity from Chap 3, plus additional exercises on Rademacher complexity for point clouds in Rn and spaces of functions. Also review Hoeffding’s and McDiarmid's Ineq's.
- Week #9: Finish discussing Rademacher complexity by obtaining learning bounds. Start VC dimension (which does not depend on distribution D).
- Week #10: Reproducing kernels and their associated Hilbert spaces (RKHS) as well as kernel methods.
- Week #11 (Mar 27th, Mar 29th): Online learning. Perceptron learning algorithm. Then MLPs (Multilayer Perceptrons). CNNs.
- Week #12 (April 3rd, April 5th): Boosting (Mohri et al Section 7.1 and part of 7.3).
(More logistical details are coming soon.)