Syllabus for MMA440 Mathematical Structures of Deep Neural Networks Spring 24

Course syllabus

The goal of this course is to give an overview of mathematical structures that appear in modern deep neural networks and how these can be used to understand neural networks theoretically. The course is aimed at students familiar with advanced mathematics but not machine learning.

At the end of the course, the students should be able to understand current research literature in the field and carry out their own projects in deep learning.

The course starts with a concise introduction to deep learning, followed by deeper dives into the topics of equivariant neural networks, large width neural networks and geometrical aspects of explainable AI. Additional lectures cover the practical aspects of implementing neural network training on Google Cloud infrastructure. The lectures are accompanied by exercise sessions and homework assignments, where students gain hands-on experience with training neural networks. The course is graded with a project.

Program

The schedule of the course is available in TimeEdit.

Lectures

In the first part of the course, the lectures introduce the necessary deep learning background

Lecture 1, Introduction to machine learning:

Supervised learning, unsupervised learning reinforcement learning
Empirical risk minimization
Linear classifier
Maximum likelihood and negative log-likelihood loss
Principle component analysis and singular value decomposition

Lecture 2, More machine learning basics:

Generalized linear models
Generalization, underfitting, overfitting
Regularization, weight decay
Hyperparameters and model evaluation
Bias-variance tradeoff

Lecture 3, Introduction to deep learning:

Multi-layer perceptron and universal approximation theorems
Backpropagation, forward- and reverse mode AD
Activation functions
Gradient based optimizers

Lecture 4, Deep learning, part 2:

Exploding and vanishing gradients
Xavier- and Kaiming initialization
Regularization of neural networks: weight decay, early stopping, batch normalization, ensembles, dropout, data augmentation, gradient clipping
Transfer learning
Adversarial attacks
Learning random labels

Lecture 5, Important neural network architectures:

Convolutional neural networks
Residual neural networks
Transformer
Graph neural networks
Generative adversarial networks

In the second part of the course, the theoretical basis for advanced topics are discussed

Lecture 6, Explainable AI and geometry:

Explainable AI, saliency maps, clever Hans effect
Counterfactual explanations and adversarial examples
Normalizing flows
Diffeomorphic counterfactuals
Normal coordinates on the data manifold
Induced metric on the data manifold
Results on a toy problem and various image datasets

Lecture 7: Equivariant neural networks, part 1:

Symmetric learning problems, regular representation
Equivariant models
Ways to build equivariant models, data augmentation, averaging, canonicalization, manifest equivariance
Group convolutions, lifting, group pooling
Group convolutional neural networks for roto-translations

Lecture 8: Equivariant neural networks, part 2:

Universality of group convolutions
Group convolutions for SO(3) and spherical CNNs, Clebsh-Gordon nets
General group convolutions in Fourier space, Peter-Weyl theorem
Equivariant graph neural networks
Geometric deep learning

Lecture 9: Large width neural networks, part 1:

Gaussian processes
Empirical neural tangent kernel
Standard- and NTK parametrization
Neural network - Gaussian process correspondence
NTK convergence at infinite width

Lecture 10: Large width neural networks, part 2:

Solving gradient descent dynamics at infinite width
Computing the NTK, forward equation
Linear models and infinitely wide neural networks
Criticality and neural network phase diagram
Trainability and generalization

In parallel to the above lectures which cover the theoretical background, additional lectures introduce the practical aspects of implementing neural network training with standard deep learning frameworks using Google Cloud computing infrastructure

Lecture A: Python and the compute environment

Lecture B: Tensors in NumPy and PyTorch

Lecture C: Automatic differentiation and neural networks in PyTorch

Lecture D: Neural network training in PyTorch

Lecture E: PyTorch Lightning

Lecture F: Project proposal and normalizing flows

Assignments and exercise sessions

The assignments are available in this GitHub repository: https://github.com/JanEGerken/ms-in-dnns

In weekly homework assignments, you will implement neural network training related to the subjects covered in the lectures. This serves as a preparation for the final project. The assignments are discussed in weekly exercise sessions where technical problems can also be discussed. Assignments are due on Fridays at 18:00 and submitted via Canvas.

In order to be admitted to the project, you need to reach at least 60% of the points in the assignments and submit a project proposal.

Project

The course is graded based on a project which you work on in the last two weeks of term. The project should be based on a paper from the research areas covered in the second part of the course. PhD students can (after talking to Jan) do a project related to their research. The project must include the implementation of a neural network training setup which goes beyond what was covered in the architectures discussed in lecture 5. You are responsible for developing your own project ideas, summarized in a project proposal which is part of Assignment F. You will be given feedback on your proposal and there are two exercise sessions dedicated to support with the project.

The project grade is based on a written report and an oral presentation. The report should be at most five pages long, excluding references and appendices. A LaTeX template for the report which also contains a suggested structure is available here.

The final grade will be based to 25% each on

the project idea as stated in the project proposal, i.e. you will get a high grade if you have a creative idea which can be expected to be realizable with the time and compute resources available
the execution of the project, i.e. you will get a high grade here if you reached interesting results, overcame difficulties etc.
the clarity of the report, i.e. you will get a high grade here if your report is well structured, well written, has informative plots etc.
the clarity of the oral presentation, i.e. you will get a high grade here if your presentation is clear and you are able to respond reasonably to some questions

You have to pass all four to pass the course. The final grade will be either U underkänd/fail, G godkänd/pass or VG väl godkänd/pass with distinction.

Detailed course plan

	Monday 13:15 in MVF21	Tuesday 13:15 in MVH12	Thursday 15:15 in MVF21	Friday 13:15 in MVF21
Week 3	Lecture 1	Lecture A	Exercise Class 1 Assignment A Due	Lecture B
Week 4	Lecture 2	Lecture 3	Exercise Class 2 Assignment B Due	Lecture C
Week 5	Lecture 4	Lecture 5	Exercise Class 3	Lecture D Assignment C Due
Week 6	Lecture 6	Lecture 7	Exercise Class 4	Lecture E Assignment D Due
Week 7	Lecture 8	Lecture 9	Exercise Class 5	Lecture F Assignment E Due
Week 8	Lecture 10	—	Exercise Class 6	Assignment F Due
Week 9	—	—	Exercise Class 7	—
Week 10	—	—	Exercise Class 8	—
Week 11	Report Due	—	Presentation	—

Teachers

Examiner: Jan Gerken

Lecturer: Jan Gerken

Teaching assistant: Philipp Misof

Course literature

Part one: A good introduction to deep learning is given in

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press, freely available here: https://www.deeplearningbook.org/

For part two, references to the relevant literature are provided for each lecture separately.