Course syllabus

The goal of this course is to give an overview of mathematical structures that appear in modern deep neural networks and how these can be used to understand neural networks theoretically. The course is aimed at students familiar with advanced mathematics but not machine learning.

At the end of the course, the students should be able to understand current research literature in the field and carry out their own projects in deep learning.

The course starts with a concise introduction to deep learning, followed by deeper dives into the topics of equivariant neural networks, large width neural networks and geometrical aspects of explainable AI. Additional lectures cover the practical aspects of implementing neural network training on Google Cloud infrastructure. The lectures are accompanied by exercise sessions and homework assignments, where students gain hands-on experience with training neural networks. The course is graded with a project.

Program

The schedule of the course is available in TimeEdit.

Lectures

In the first part of the course, the lectures introduce the necessary deep learning background

Lecture 1, Introduction to machine learning:

  • Supervised learning, unsupervised learning reinforcement learning
  • Empirical risk minimization
  • Linear classifier
  • Maximum likelihood and negative log-likelihood loss
  • Principle component analysis and singular value decomposition

Lecture 2, More machine learning basics:

  • Generalized linear models
  • Generalization, underfitting, overfitting
  • Regularization, weight decay
  • Hyperparameters and model evaluation
  • Bias-variance tradeoff

Lecture 3, Introduction to deep learning:

  • Multi-layer perceptron and universal approximation theorems
  • Backpropagation, forward- and reverse mode AD
  • Activation functions
  • Gradient based optimizers

Lecture 4, Deep learning, part 2:

  • Exploding and vanishing gradients
  • Xavier- and Kaiming initialization
  • Regularization of neural networks: weight decay, early stopping, batch normalization, ensembles, dropout, data augmentation, gradient clipping
  • Transfer learning
  • Adversarial attacks
  • Learning random labels

Lecture 5, Important neural network architectures:

  • Convolutional neural networks
  • Residual neural networks
  • Transformer
  • Graph neural networks
  • Generative adversarial networks

 

In the second part of the course, the theoretical basis for advanced topics are discussed

Lecture 6, Explainable AI and geometry:

  • Explainable AI, saliency maps, clever Hans effect
  • Counterfactual explanations and adversarial examples
  • Normalizing flows
  • Diffeomorphic counterfactuals
  • Normal coordinates on the data manifold
  • Induced metric on the data manifold
  • Results on a toy problem and various image datasets

Lecture 7: Equivariant neural networks, part 1:

  • Symmetric learning problems, regular representation
  • Equivariant models
  • Ways to build equivariant models, data augmentation, averaging, canonicalization, manifest equivariance
  • Group convolutions, lifting, group pooling
  • Group convolutional neural networks for roto-translations

Lecture 8: Equivariant neural networks, part 2:

  • Universality of group convolutions
  • Group convolutions for SO(3) and spherical CNNs, Clebsh-Gordon nets
  • General group convolutions in Fourier space, Peter-Weyl theorem
  • Equivariant graph neural networks
  • Geometric deep learning

Lecture 9: Large width neural networks, part 1:

  • Gaussian processes
  • Empirical neural tangent kernel
  • Standard- and NTK parametrization
  • Neural network - Gaussian process correspondence
  • NTK convergence at infinite width

Lecture 10: Large width neural networks, part 2:

  • Solving gradient descent dynamics at infinite width
  • Computing the NTK, forward equation
  • Linear models and infinitely wide neural networks
  • Criticality and neural network phase diagram
  • Trainability and generalization

 

In parallel to the above lectures which cover the theoretical background, additional lectures introduce the practical aspects of implementing neural network training with standard deep learning frameworks using Google Cloud computing infrastructure

Lecture A: Python and the compute environment

Lecture B: Tensors in NumPy and PyTorch

Lecture C: Automatic differentiation and neural networks in PyTorch

Lecture D: Neural network training in PyTorch

Lecture E: PyTorch Lightning

Lecture F: Project proposal and normalizing flows

 

Assignments and exercise sessions

The assignments are available in this GitHub repository: https://github.com/JanEGerken/ms-in-dnns

In weekly homework assignments, you will implement neural network training related to the subjects covered in the lectures. This serves as a preparation for the final project. The assignments are discussed in weekly exercise sessions where technical problems can also be discussed. Assignments are due on Fridays at 18:00 and submitted via Canvas.

In order to be admitted to the project, you need to reach at least 60% of the points in the assignments and submit a project proposal.

Project

The course is graded based on a project which you work on in the last two weeks of term. The project should be based on a paper from the research areas covered in the second part of the course. PhD students can (after talking to Jan) do a project related to their research. The project must include the implementation of a neural network training setup which goes beyond what was covered in the architectures discussed in lecture 5. You are responsible for developing your own project ideas, summarized in a project proposal which is part of Assignment F. You will be given feedback on your proposal and there are two exercise sessions dedicated to support with the project.

The project grade is based on a written report and an oral presentation. The report should be at most five pages long, excluding references and appendices. A LaTeX template for the report which also contains a suggested structure is available here.

The final grade will be based to 25% each on

  • the project idea as stated in the project proposal, i.e. you will get a high grade if you have a creative idea which can be expected to be realizable with the time and compute resources available
  • the execution of the project, i.e. you will get a high grade here if  you reached interesting results, overcame difficulties etc.
  • the clarity of the report, i.e. you will get a high grade here if your report is well structured, well written, has informative plots etc.
  • the clarity of the oral presentation, i.e. you will get a high grade here if your presentation is clear and you are able to respond reasonably to some questions

You have to pass all four to pass the course. The final grade will be either U underkänd/fail, G godkänd/pass or VG väl godkänd/pass with distinction.

 

Detailed course plan

Monday 13:15 in MVF21

Tuesday 13:15 in MVH12

Thursday 15:15 in MVF21

Friday 13:15 in MVF21

Week 3

Lecture 1 Lecture A

Exercise Class 1
Assignment A Due

Lecture B

Week 4

Lecture 2 Lecture 3

Exercise Class 2
Assignment B Due

Lecture C

Week 5

Lecture 4 Lecture 5

Exercise Class 3

Lecture D
Assignment C Due

Week 6

Lecture 6 Lecture 7

Exercise Class 4

Lecture E
Assignment D Due

Week 7

Lecture 8 Lecture 9

Exercise Class 5

Lecture F
Assignment E Due

Week 8

Lecture 10

Exercise Class 6

Assignment F Due

Week 9

Exercise Class 7

Week 10

Exercise Class 8

Week 11

Report Due Presentation

Teachers

Examiner: Jan Gerken

Lecturer: Jan Gerken

Teaching assistant: Philipp Misof

 

Course literature

Part one: A good introduction to deep learning is given in

For part two, references to the relevant literature are provided for each lecture separately.