Lectures

Lectures#

Lecture 1: Introduction and a brief history of neural networks#

This lecture maps a travel route past the three waves of neural network hype: the Perceptron era, the Back-propagation era, and the Deep learning era. It sketches the evolution from Jeff Elman’s recurrent neural nets [Elm90] to GRUs [CVanMerrienboerG+14], LSTMs [HS97] and the Transformer [VSP+17].

Lecture 2: Transformers: Key concepts and building blocks#

The Transformer [VSP+17] is the current best solution to predicting the next word, with autoregression and an impressive method to keep track of all things that matter in the input seen so far, attention. This lecture introduces the key concepts and building blocks of the Transformer architecture.

Lecture 3: Transformers recap, In-context learning, Post-training, Tools and agents#

After a short recap of the Transformer architecture, we dive into a select number of unique aspects of training Transformers: in-context learning and post-training.

Lecture 4: Benchmarking LLMs#

Evaluating LLMs is not a simple matter. Many evaluation metrics and benchmarks that emerged from the field of natural language processing are still usable, but the capabilities of chatbots based on autoregressive encoder Transformers (so-called Generative AI) have spawned an entirely new type of benchmark task that increasingly looks like the tests we took at school or in college.

Lecture 5: Data, bias, alignment#

Data is a key ingredient to training and fine-tuning Transformers. What do we know from language data, and what does Zipf’s law predict? If commercial LLMs are trained on trillions of tokens, where does all that data come from? What type of biases occur in them, can we detect and mitigate them?

Lecture 6: Efficient LLMs#

This lecture is accompanied by Notebook 6: Efficient LLM Inference.

LLM efficiency is a big issue; the size wars of the commercial LLM providers are fortunately complemented with smart countermeasures that optimize aspects of the LLM architecture and processes. Not being able to ever be complete, this lecture covers quantization, speculative decoding, the Chinchilla scaling law, and parameter-efficient finetuning (PEFT).

Lecture 7: Reasoning in LLMs#

This lecture is accompanied by Notebook 7: Reasoning with LLMs.

LLMs are to some extent capable of reasoning. One way is by running an internal dialogue, a ‘chain of thought’, thereby allowing building a reasoning chain in multiple hops. Within bounds they can even reason by propagation of information in the internal graph of the Transformer architecture. In this lecture we talk about both these types of implicit and explicit reasoning.

Guest Lectures (2025)#

Lukas Edman (TU Munich) talked about tokenization, the essential first step in the training process: inducing, from training text, a limited token vocabulary consisting of words, subwords and/or characters, and then tokenizing the raw textual training material according to this vocabulary, of which the dimensions are a major factor in neural LLMs. Lukas covers BPE for token vocabulary induction and spends time asking the question what could be the benefits of the most extreme tokenizer of all: the character or byte-level tokenizer.

Fabian Ferrari (Utrecht University) gave a lecture on the governance of Transformers, introducing a thought-provoking contrast between the relentless computing of commercial AI, versus a conditional computing practice that would allow governments to stimulate the development of AI conditioned on the return of public value.

The Mixed Bag Lecture#

There are so many more interesting topics to cover that are impossible to fit in a single teaching block. In this lecture we scoop up some miscellaneous and related topics such as Mixtures of Experts, watermarking and fingerprinting, the predictive brain in cognitive neuroscience, TESCREAL. Which topics will we cover in next year’s mixed bag lecture?