Lectures#
Lecture 1: Introduction and a brief history of neural networks#
This lecture maps a travel route past the three waves of neural network hype: the Perceptron era, the Back-propagation era, and the Deep learning era. It sketches the evolution from Jeff Elman’s recurrent neural nets [Elm90] to GRUs [CVanMerrienboerG+14], LSTMs [HS97] and the Transformer [VSP+17].
Lecture 2: Transformers: Key concepts and building blocks#
This lecture is accompanied by Notebook 2: The Transformer Anatomy Lesson: miniGPT.
The Transformer [VSP+17] is the current best solution to predicting the next word, with autoregression and an impressive method to keep track of all things that matter in the input seen so far, attention. This lecture introduces the key concepts and building blocks of the Transformer architecture.
Lecture 3: Transformers recap, In-context learning, Post-training, Agents#
This lecture is accompanied by Notebook 3: The Transformer Anatomy Lesson: Visualizing attention and Notebook 4: Instruction tuning, Tools and Agents.
After a short recap of the Transformer architecture, we dive into aspects of training Transformers beyond pre-training: in-context learning and post-training. We also talk about putting LLMs to work as agents. We conclude with two frequently-used Transformer features: the mixture-of-experts architecture and LoRA.
Lecture 4: Efficient LLMs#
This lecture is accompanied by Notebook 5: Efficient LLM Inference and Notebook 6: Fine tuning of Transformers models using PEFT (Parameter Efficient Fine-Tuning) techniques.
LLM efficiency is a big issue; the size wars of the commercial LLM providers, predictable from the scaling laws of Transformers, are softened by smart countermeasures that optimize aspects of the LLM architecture and processes. Not being able to ever be complete, this lecture covers a mixture of methods such as flash attention, PEFT, KV caching, distillation, and speculative decoding.
Lecture 5: Benchmarking LLMs#
This lecture is accompanied by Notebook 8: Benchmarking LLMs.
Evaluating LLMs is not a simple matter. Many evaluation metrics and benchmarks that emerged from the field of natural language processing are still usable, but the capabilities of chatbots based on autoregressive encoder Transformers have spawned an entirely new type of benchmark task that increasingly looks like the tests we took at school or in college.
Lecture 6: Data, bias, alignment#
Data is a key ingredient to training and fine-tuning Transformers. What do we know from language data, and what does Zipf’s law predict? If commercial LLMs are trained on trillions of tokens, where does all that data come from? What type of biases occur in them, can we detect and mitigate them? Chatbot developers typically want to mitigate bias because they want there models to be aligned with the values and expectations of users. How to align a chatbot - or, how to turn a next-token predictor into a helpful dialogue partner?
Lecture 7: Reasoning in LLMs#
This lecture is accompanied by Notebook 7: Reasoning with LLMs.
LLMs are to some extent capable of reasoning. One way is by running an internal dialogue, a ‘chain of thought’, thereby allowing building a reasoning chain in multiple hops. Within bounds they can even reason by propagation of information in the internal graph of the Transformer architecture. In this lecture we talk about both these types of implicit and explicit reasoning.
Guest Lectures#
Lukas Edman: Tokenization#
Lukas Edman (TU Munich) talked about tokenization, the essential first step in the training process: inducing, from training text, a limited token vocabulary consisting of words, subwords and/or characters, and then tokenizing the raw textual training material according to this vocabulary, of which the dimensions are a major factor in neural LLMs. Lukas covers BPE for token vocabulary induction and spends time asking the question what could be the benefits of the most extreme tokenizer of all: the character or byte-level tokenizer.
Fabian Ferrari: Governance of Transformers#
Fabian Ferrari (Utrecht University) talks about the governance of Transformers. Taking a global perspective, he puts the EU in focus and discusssses the three challenges faced by the EU: foreign ownership of the AI stack, the tension between public funding, private interests, and the geopolitical situation.
Ruurd Kuiper: Diffusion-based LLMs#
Ruurd Kuiper introduces the topic of diffusion-based LLMs, an idea that deviates from the standard left-to-right autoregressive text generation framework of decoder Transformers. Instead, diffusion-based LLMs generate a full text, by starting with a block of fully masked text, and iteratively unmasking and generating sub-parts of the full text.
This lecture is accompanied by Notebook 10: Build your own Diffusion Language Model.
The Mixed Bag Lecture#
There are so many more interesting topics to cover that are impossible to fit in a single teaching block. In this lecture we scoop up some miscellaneous and related topics such as Mixtures of Experts, watermarking and fingerprinting, the predictive brain in cognitive neuroscience, and TESCREAL.