Notebooks#
Notebook 1: Basics; and how to fine-tune a text classifier.#
This course introduces you to Transformers, a class of deep learning neural networks based on the Transformer architecture. Although this architecture was introduced relatively recently [VSP+17], history did not start in 2017. Many of the components in and ideas behind the Transformer have a history that harks back to the first wave of neural network research [MP43, MP69, Ros58], as well as the second wave [Elm90, RMtPRG86].
In some of the following notebooks we follow, broadly, the book Natural Language Processing with Transformers, by 🤗 staff members Lewis Tunstall, Leandro von Werra, and Thomas Wolf. See the full notebook collection for this book.
Notebook 2: Text generation with GPT#
In Lecture 1: Introduction and a brief history of neural networks we talk about different types of transformer-based models. But before we dive in, we wanted to take some time to look at one particular type of models in more detail: models trained for text generation. These are the models that became most prominent recently, because they lie at the core of all the amazing and popular large language models that have been taking the world by storm in the last couple of years (think ChatGPT, Claude, LLaMa, DeepSeek etc.). Later, we will talk in some detail about how these modern interactive models relate to the basic text-generating models like the one we will look at in this notebook.
We are going to explore text generation with GPT-style Transformers. This notebook loosely follows Chapter 5 of the Natural Language Processing With Transformers book, and its associated notebook. This notebook introduces you to some concepts related to generating text, and features two exercises:
The first exercise focuses on hyperparameters for text generation (in particular, temperature).
The second exercise is about testing the capacities of GPT-2, the predecessor of GPT-3 and 4, in your native language.
We are using the GPT2 model on Hugging Face. If your runtime engine allows it, you may also try the larger models GPT-Medium, GPT-Large, or GPT-XL. Performance quality tends to go up with larger models.
Notebook 3: The Transformer Anatomy Lesson#
This notebook consists of two parts:
PART I: Implement a full transformer encoder in PyTorch. Note that we will not train it. In this implementation, we pretty closely follow Chapter 3 of the Natural Language Processing with Transformers book.
PART II: Visualizing attention, in which we use the bertviz library to visualize internals of bert-base-uncased, a pretrained BERT model.
Notebook 4: Tokenizers and Tools#
This notebook consists of two parts that have to do with the two different topics covered in Lecture 4: Benchmarking LLMs and Lukas Edman’s Guest Lectures (2025) on tokenization:
PART I: Tokenization and character-level information. In Lecture 4: Benchmarking LLMs we discussed standard practices of text tokenization for recent transformer language models. In particular, the most common subword tokenization algorithm, BPE, was introduced. Additionally, we discussed character-level tokenization as an alternative to subword tokenization algorithms.
PART II: Tools and agents. LLM output can be used to trigger external instruments, such as web search, calculator etc. The output of running these external instruments then can be fed back to the LLM and may further condition its text generation. We will play with agents and tools with the smolagents library.
Notebook 5: Transformers for NLP#
In this Notebook we exemplify how to use Transformers for NLP tasks such as multi-lingual named-entity recognition.
Notebook 6: Efficient LLM Inference#
In this notebook we exemplify two efficient inference approaches: int8 quantization and speculative decoding, as discussed in Lecture 6: Efficient LLMs.
Notebook 7: Reasoning with LLMs#
Accompanying Lecture 7: Reasoning in LLMs, this Notebook walks you through inference using Qwen-2.5-1.5B and DeepSeek-R1-Distill-Qwen-1.5B. You are introduced in comparing direct versus chain-of-thought reasoning, with math problem solving as the test domain.