Course syllabus
The goal of this course is to give an overview of mathematical structures that appear in modern deep neural networks and how these can be used to understand neural networks theoretically. The course is aimed at students familiar with advanced mathematics but not machine learning.
At the end of the course, the students should be able to understand current research literature in the field and carry out their own projects in deep learning.
The course starts with a concise introduction to deep learning, followed by deeper dives into the topics of equivariant neural networks, large width neural networks and geometrical aspects of explainable AI. Additional lectures cover the practical aspects of implementing neural network training on Google Cloud infrastructure. The lectures are accompanied by exercise sessions and homework assignments, where students gain hands-on experience with training neural networks. The course is graded with a project.
Program
The schedule of the course is available in TimeEdit.
Lectures
In the first part of the course, the lectures introduce the necessary deep learning background
Lecture 1, Introduction to machine learning:
- Supervised learning, unsupervised learning reinforcement learning
- Empirical risk minimization
- Linear classifier
- Maximum likelihood and negative log-likelihood loss
- Principle component analysis and singular value decomposition
Lecture 2, More machine learning basics:
- Generalized linear models
- Generalization, underfitting, overfitting
- Regularization, weight decay
- Hyperparameters and model evaluation
- Bias-variance tradeoff
Lecture 3, Introduction to deep learning:
- Multi-layer perceptron and universal approximation theorems
- Backpropagation, forward- and reverse mode AD
- Activation functions
- Gradient based optimizers
Lecture 4, Deep learning, part 2:
- Exploding and vanishing gradients
- Xavier- and Kaiming initialization
- Regularization of neural networks: weight decay, early stopping, batch normalization, ensembles, dropout, data augmentation, gradient clipping
- Transfer learning
- Adversarial attacks
- Learning random labels
Lecture 5, Important neural network architectures:
- Convolutional neural networks
- Residual neural networks
- Transformer
- Graph neural networks
- Generative adversarial networks
In the second part of the course, the theoretical basis for advanced topics are discussed
Lecture 6, Explainable AI and geometry:
- Explainable AI, saliency maps, clever Hans effect
- Counterfactual explanations and adversarial examples
- Normalizing flows
- Diffeomorphic counterfactuals
- Normal coordinates on the data manifold
- Induced metric on the data manifold
- Results on a toy problem and various image datasets
Lecture 7: Equivariant neural networks, part 1:
- Symmetric learning problems, regular representation
- Equivariant models
- Ways to build equivariant models, data augmentation, averaging, canonicalization, manifest equivariance
- Group convolutions, lifting, group pooling
- Group convolutional neural networks for roto-translations
Lecture 8: Equivariant neural networks, part 2:
- Universality of group convolutions
- Group convolutions for SO(3) and spherical CNNs, Clebsh-Gordon nets
- General group convolutions in Fourier space, Peter-Weyl theorem
- Equivariant graph neural networks
- Geometric deep learning
Lecture 9: Large width neural networks, part 1:
- Gaussian processes
- Empirical neural tangent kernel
- Standard- and NTK parametrization
- Neural network - Gaussian process correspondence
- NTK convergence at infinite width
Lecture 10: Large width neural networks, part 2:
- Solving gradient descent dynamics at infinite width
- Computing the NTK, forward equation
- Linear models and infinitely wide neural networks
- Criticality and neural network phase diagram
- Trainability and generalization
In parallel to the above lectures which cover the theoretical background, additional lectures introduce the practical aspects of implementing neural network training with standard deep learning frameworks using Google Cloud computing infrastructure
Lecture A: Python and the compute environment
Lecture B: Tensors in NumPy and PyTorch
Lecture C: Automatic differentiation and neural networks in PyTorch
Lecture D: Neural network training in PyTorch
Lecture E: PyTorch Lightning
Lecture F: Project proposal and normalizing flows
Assignments and exercise sessions
The assignments are available in this GitHub repository: https://github.com/JanEGerken/ms-in-dnns
In weekly homework assignments, you will implement neural network training related to the subjects covered in the lectures. This serves as a preparation for the final project. The assignments are discussed in weekly exercise sessions where technical problems can also be discussed. Assignments are due on Fridays at 18:00 and submitted via Canvas.
In order to be admitted to the project, you need to reach at least 60% of the points in the assignments and submit a project proposal.
Project
The course is graded based on a project which you work on in the last two weeks of term. The project should be based on a paper from the research areas covered in the second part of the course. PhD students can (after talking to Jan) do a project related to their research. The project must include the implementation of a neural network training setup which goes beyond what was covered in the architectures discussed in lecture 5. You are responsible for developing your own project ideas, summarized in a project proposal which is part of Assignment F. You will be given feedback on your proposal and there are two exercise sessions dedicated to support with the project.
The project grade is based on a written report and an oral presentation. The report should be at most five pages long, excluding references and appendices. A LaTeX template for the report which also contains a suggested structure is available here.
The final grade will be based to 25% each on
- the project idea as stated in the project proposal, i.e. you will get a high grade if you have a creative idea which can be expected to be realizable with the time and compute resources available
- the execution of the project, i.e. you will get a high grade here if you reached interesting results, overcame difficulties etc.
- the clarity of the report, i.e. you will get a high grade here if your report is well structured, well written, has informative plots etc.
- the clarity of the oral presentation, i.e. you will get a high grade here if your presentation is clear and you are able to respond reasonably to some questions
You have to pass all four to pass the course. The final grade will be either U underkänd/fail, G godkänd/pass or VG väl godkänd/pass with distinction.
Detailed course plan
Monday 13:15 in MVF21 |
Tuesday 13:15 in MVH12 |
Thursday 15:15 in MVF21 |
Friday 13:15 in MVF21 |
|
Week 3 |
Lecture 1 | Lecture A |
Exercise Class 1 |
Lecture B |
Week 4 |
Lecture 2 | Lecture 3 |
Exercise Class 2 |
Lecture C |
Week 5 |
Lecture 4 | Lecture 5 |
Exercise Class 3 |
Lecture D Assignment C Due |
Week 6 |
Lecture 6 | Lecture 7 |
Exercise Class 4 |
Lecture E Assignment D Due |
Week 7 |
Lecture 8 | Lecture 9 |
Exercise Class 5 |
Lecture F Assignment E Due |
Week 8 |
Lecture 10 | — |
Exercise Class 6 |
Assignment F Due |
Week 9 |
— | — | Exercise Class 7 | — |
Week 10 |
— | — | Exercise Class 8 | — |
Week 11 |
Report Due | — | Presentation | — |
Teachers
Examiner: Jan Gerken
Lecturer: Jan Gerken
Teaching assistant: Philipp Misof
Course literature
Part one: A good introduction to deep learning is given in
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, freely available here: https://www.deeplearningbook.org/
For part two, references to the relevant literature are provided for each lecture separately.