Large Language Models

Goals of the course:

Explain how the models work
Teach basic usage of the models
Help students critically assess what you read about them
Encourage thinking about the broader context of using the models

Syllabus from SIS:

Basics of neural networks for language modeling
Language model typology
Data acquisition and curation, downstream tasks
Training (self-supervised learning, reinforcement learning with human feedback)
Finetuning & Inference
Multilinguality and cross-lingual transfer
Large Language Model Applications (e.g., conversational systems, robotics, code generation)
Multimodality (CLIP, diffusion models)
Societal impacts
Interpretability

About

SIS code: NPFL140
Semester: summer
E-credits: 3
Examination: 0/2 C
Guarantors: Jindřich Helcl, Jindřich Libovický

Timespace Coordinates

The course is held on Thrusdays at 15:40 in S3. The first lecture took place on 22 February.

Lectures

1. Introductory notes and discussion on large language models Slides

2. The Transformer Architecture Lecture notes Slides

3. LLM Training Slides Recording

4. LLM Inference Slides Code Recording

5. Generating Weather Reports Assignment

6. Data and Evaluation Lecture notes

7. Evaluation, Working with the Models MCQA Evaluation Speech Translation LLMs for Machine Translation Chain-of-thought Prompting; RAG Generation; Evaluation; Web navigation Experience with LLMs Recording

8. LLM Efficiency Assignment review Efficiency Recording

9. Multilinguality Slides Recording Assignment

10. LLMs for Speech-to-Text Slides

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

1. Introductory notes and discussion on large language models

Feb 22 Slides

Covered topics: aims of the course, passing requirements. We discussed what are (large) language models, what are they for, what are their benefits and downsides. We concluded with a rough analysis of ChatGPT performance in different languages.

2. The Transformer Architecture

Mar 7 Lecture notes Slides

After the class, you should be able to:

Explain the building blocks of the Transformer architecture to a non-technical person
Describe the Transformer architecture using equations, especially the self-attention block
Implement the Transformer architecture (in PyTorch or another framework with automated differentiation)

Class outline:

Transformer architecture in equations (Lecture notes)
Coding session based on the NanoGPT code base
Recent architecture tweaks (Slides)

Additional materials:

The Illustrated Transformer
Let's build GPT: from scratch, in code, spelled out., a YouTube video by Andrej Karpathy

3. LLM Training

Mar 14 Slides Recording

After the class, you should be able to:

Give a high-level description of how neural networks are trained
Read and understand a neural training library documentation
Explain the differences between various training techniques used in LLMs today

Class outline:

Rest of the discussion on Transformers, see above
General introduction into neural network & transformer model training, pretrained models, RLHF, DPO

Additional materials:

How Neural Networks are Trained, a detailed overview of NN training
Tensorflow Playground, interactive website for NN training (train your own teeny tiny NN)
Illustrating Reinforcement Learning from Human Feedback (RLHF), a Huggingface blogpost
ChatGPT: This AI has a JAILBREAK?! (Unbelievable AI Progress), a Yannic Kilcher video that includes a quick explanation of RLHF (don't mind the hype, the video was from the time when ChatGPT was really new)
Direct Preference Optimization, a video explanation on AI Coffee Break with Letitia

4. LLM Inference

Mar 21 Slides Code Recording

After the class, you should be able to:

Give a high-level description of how a transformer predicts a probability distribution for the next token in the sequence
Select the appropriate decoding algorithm for your use-case and understand its parameters
Write a Python code snippet for generating text with an open language model using the transformers library

Class outline:

Discussion, LLM zoo
3D visualization of transformer inference
Decoding algorithms - exact inference (MAP), greedy search, beam search, top-k, top-p, Mirostat, locally typical sampling
Hands-on demonstration of text generation with the transformers library
Bonus: non-autoregressive decoding, reverse-engineering decoding algorithms

Additional materials:

HuggingFace Models, the repository of open (L)LMs
Awesome LLM, a curated list of resources for LLMs
LLM Visualization, a 3D visualization of transformer inference
How to generate text, the Huggingface decoding algorithms overview
Generation with LLMs, common pitfalls when generating text with LLMs

5. Generating Weather Reports

Mar 28 Assignment

Assignment #1

Deadline: Friday 12 April, 2024.
Assignment description + code
Submission form

After the class, you should be able to:

Write a basic Python code querying a LLM through an OpenAI-like API.
Set up a suitable prompt and parameters to get the expected output.
Describe what are the opportunities and limits of recent open LLMs.

Class outline:

Introduction
Working on the assignment

Additional materials:

OpenWeather API docs: https://openweathermap.org/api/
Prompting guide: https://www.promptingguide.ai
Best prompting practices from Huggingface

6. Data and Evaluation

Apr 4 Lecture notes

After the class, you should be able to:

Look for a dataset for a specified NLP task and find one (given the task is reasonably common)
Roughly assess the usefulness of the dataset based on its statistics
Pick an evaluation method that suits the task
Have a sense of what a "reasonable" score in that task might look like

Class outline:

Data for language modeling
NLP tasks and data (introduction + team work)
Evaluation (introduction + team work)

Additional materials:

A fresh survey paper on datasets for LLMs

7. Evaluation, Working with the Models

Apr 11 MCQA Evaluation Speech Translation LLMs for Machine Translation Chain-of-thought Prompting; RAG Generation; Evaluation; Web navigation Experience with LLMs Recording

Class outline:

Remarks on LLM evaluation on multiple-choice question answering task
Speech translation challenges
Using LLMs for machine translation
Chain-of-thought prompting, retrieval-augmented generation
Generation, evaluation and Web navigation using LLMs
Experience with using LLMs within the EDU-AI project, Task-oriented Dialogue

8. LLM Efficiency

Apr 18 Assignment review Efficiency Recording

After the class, you should be able to:

Identify technical bottlenecks constraining inference and training with LLMs
Know methods enabling the usage LLMs under computational restrictions:
- parameter efficient fine-tuning,
- quantization,
- picking the right model scale for your data.

Class outline:

Assignment 1 review
Time and space requirements of LLMs
Low-rank adaptation
Quantization
Scaling

9. Multilinguality

Apr 25 Slides Recording Assignment

Assignment #2

Deadline: Thursday 2 May, 2024.
Assignment description + code

After the class, you should be able to:

Name benefits of multilingual language models and cross-lingual transfer.
Pick the multilingual model suitable for a specific language based on training data, similar languages covered and tokenizer properties.

Class outline:

Guided discussions: why do we train multilingual LMs? How to train multilingual LMs?
Availability of data throughout languages, resourcefulness levels.
Variability of languages: typology and writing systems
Multilingual tokenization
Application of LLMs for machine translation

10. LLMs for Speech-to-Text

May 2 Slides

After the class, you should know:

Motivation for speech in LLMs
The basic and example speech-to-text methods
Real-time methods

Class outline:

Speech NLP tasks (ASR, translation, emotion recognition, …)
Speech in NNs (sound representation, MFCC, raw audio) and in LLMs (Wav2vec, HuBERT, Whisper)
Simultaneous methods: re-translation vs. incremental
Streaming policies wait-k and LocalAgreement
Whisper-Streaming and ELITR demo

Active participation

There will be two or three tasks during the semester; we will work on them mainly during classes but they might turn into a (small) homework.

Reading assignments

You will be asked at least once to read a paper before the class.

Final written test

You need to take part in a final written test that will not be graded.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Large Language Models

About

Timespace Coordinates

Lectures

License

1. Introductory notes and discussion on large language models

2. The Transformer Architecture

3. LLM Training

4. LLM Inference

5. Generating Weather Reports

6. Data and Evaluation

7. Evaluation, Working with the Models

8. LLM Efficiency

9. Multilinguality

10. LLMs for Speech-to-Text

Active participation

Reading assignments

Final written test