Machine Learning Primer -- Python Tutorial

Claudius Gros, WS 2025/26

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

Generative Finetuning

base model → chatbot

AI generated illustrations

base model can only predict next word
:: text completion, instead of responding to questions

fine tune of base model in several steps
- concept question -- answer
- value alignment
- thinking -- chain of thought
normally two steps
:: supervised fine tuning (from specific examples)
:: RF fine-tuning (human preferences are not absolute)
reinforcement learning from human feedback (RLHF)
:: humans train the reward model
→ reward model trains base model
:: circumvents behavioral cloning

chain of thought (CoT)

AI generated illustration

chatbot creates two sequences of words
-- a chain of thought
-- the response
consecutively, the thoughts are added to the
user prompt (self prompting)
training via examples / RLHF
CoT can be induced also via suitable prompting techniques
(without fine-tunning model parameters)

 
  user input  

 
  self-generated prompt (thinking) 

 
  response 

Revisiting LLM Reasoning via Information Bottleneck (2025)

autoencoder analogy
:: thinking ⇆ latent space
information bottleneck:
thinking corresponds to an effective latent representation,
that should
- promote generalization by minimizing the mutual information between the input and the latent features, (discarding irrelevant information)
- maximize the mutual information between the latent code and the classifying label (retaining predictive information)
idem for humans?

robot laws

Isaac Asimov
- do not harm humans
- obey orders
- protect yourself
preceeding orders take precedence
fantasy or reality?

 
  system  prompt 

 
  user input 

 
  chain of thought 

 
  response 

customized AIs

tailored for a specific company
system prompt precedes user input
:: sets context & identity
:: defines scope & rules
- purpose: tasks the AI should focus on
- tone & style: formal, informal, technical, or empathetic responses
- knowledge sources: internal documents, databases, or policies
- limitations: what the AI should not do
- safety & compliance: ethical guidelines, data privacy rules, etc.
the system prompt acts like a 'robot law',
overriding user commands

AI psychology

[Yao et.al, 2023]

Tree of Thoughts:
Deliberate Problem Solving with Large Language Models

advanced prompting technique
helping large language models to 'think'
:: substantial performance boost

modern chatbots use fine-tuned CoT (chain of thought) automatically

foundation models

GPT - generative pretrained transformer

number of adaptable parameters (matrix elements)
:: $10^8$ / $10^9$ / $10^{11}$ / $10^{12}$
:: GPT 1 / 2 / 3 / 4
$10^3-10^4\ $ context tokens, $\ \sim 80\ $ layers

automatic pretraining by predicting next word

$ \begin{array}{rl} {\color{red}\equiv} & \mathrm{base\ mode\ (GPT\!-\!3)} \\[0.0ex] {\color{red}+} & \mathrm{human\ supervised\ finetuning\ (SFT\ model)} \\[0.0ex] {\color{red}+} & \mathrm{human\ supervised\ reinforement\ learning} \\[0.0ex] {\color{red}\Rightarrow} & \mathrm{chat\ assistent\ (foundation\ model)} \end{array} {\color{red}+} & \mathrm{chain-of-thoughts training} \\[0.0ex] $

[Foundation Models for Decision Making]

SFT: $\ 10^4\!-\!10^5\ $ hand-crafted (prompt, response) pairs
:: learns to answer; ¬(text completion)

applications / tasks on top of foundation model

LLM fact sheet

transformer visualization
LLAMA Meta; currently open source
- model size $7\mbox{B}-65\mbox{B}\ $ parameters
- context length $2^{11}=2048\ $ token
- embedding dimension $2^{12}-2^{13}=4096-8192$
- attention heads $32-64$
- transformer layers $32-80$
- positional encoding rotary (relative)
GPT OpenAi; no data for GPT4, and on, published
- model size up to $\ 175\mbox{B}\ $ parameters
- context length $2^{11}=2048\ $ token
- embedding dimension up to $3\cdot2^{12}=12288$ (mostly smaller)
- attention heads up to $\ 96$
- transformer layers up to $\ 96$
- positional encoding classical (abolute)

a short history of the future

univeral computing [Turing, 1936]
artificial neurons [McCulloch and Pitts, 1943]
Dartmouth conference 1956)
term "artificial intelligence" (AI)
McCarthy, Minsky, Shannon, ...
$\to \ $ first AI hype (human-level AI within 10 years)
perceptron [Rosenblatt, 1958]
receptive fields in the brain [Hubel and Wiesel, 1959]
single layer perceptron as linear classifiers [Minsky and Papert, 1969]
:: XOR problem
$\to \ $ AI winter
multi-layer perceptrons; back-propagation [Rumelhart et al., 1986]
$\to \ $ ice slowly melting
SVMs, Bayesian networks (1990s - ...)
Long Short-Term Memory (1991 on; Schmidhuber et.al.)
deep learning [Hinton et al., 2006]
$\to \ $ revival
successes in applications (2010-12)
$\to \ $ deep-learning hype
attention [Bahdanau, Cho, Bengio 2014]
:: Dzmitry Bahdanau (Belarus)
- 2013-2015: master in computer science, Jabobs University Bremen
- 2014: 3 months internship with Yosua Bengio (Montreal)
$\to \ $ generative AI revolution starts quitely
'All you need is attention'
:: transformer [Vaswani et al. 2017; Google]
AlphaGo zero univeral game framework [DeepMind 2017]
massive transfomer models; ChatGPT [OpenAI 2022]
$\to \ $ generative AI revolution goes public
Meta-AI LLaMA leaked [2023]
:: generative AI open-source explosion
GPT4o [OpenAi 2024]
:: first end-to-end multimodal chatbot
GPT4 o1 [OpenAi 2024]
:: 'thinking' via automatic chain-of-though
2024 physics Nobel price dedicated to ML
:: John Hopfield & Geoffrey Hinton
2025 DeepSeek with mixture of experts (MoE)
2025 GPT-5 (agentic, PhD-level)