Machine Learning Primer -- Part III: Advanced Topics




Claudius Gros, WS 2025/26

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

ML Trends / Scaling / Varia

efficient learning




[Cornell]

performance

complexity barrier
old/new ML: hard/soft
 
 

scaling

$$ \begin{array}{rcl} \def\arraystretch{1.4} \mathrm{N} &:& \mathrm{number\ model\ parameters} \\ \mathrm{C} &:& \mathrm{computing\ resources} \\ \mathrm{D} &:& \mathrm{dataset\ size} \end{array} $$

competitive games

two-player knapsack $\hspace{10ex} P_{A\to B} = \frac{1}{1+10^{(r_B-r_A)/400}} = \frac{N_A}{N_A+N_B} $
two player knapsack problem





Connect 4 (and other games)

double descent




fitting vs. generalizing

given enough parameters,
one can fit an elephant

constrained overfitting

modern deep learning models work
in the regime of massiv overfitting

dropout

Copy Copy to clipboad
Downlaod Download
#!/usr/bin/env python3

# dropout illustration

import torch
import torch.nn as nn

#torch.manual_seed(42)           # manual seeding

dropout = nn.Dropout(p=0.5)     # dropout layer instantiation

input_tensor = torch.arange(12.0).view((3,4)) + 1.0

dropout.train()                 # start training mode
output_train = dropout(input_tensor)

dropout.eval()                  # evaluation mode (no scaling)
output_eval = dropout(input_tensor)

print("\n# original tensor")
print(input_tensor)

print("\n# after Dropout in training mode (with scaling)")
print(output_train)

print("\n# after Dropout in evaluation mode")
print(output_eval)

mixture of experts (MoE)

AI generated illustration



faster:  for inference, only a subset of experts is needed