Advanced Introduction to C++, Scientific Computing and Machine Learning

Claudius Gros, SS 2024

Institut for Theoretical Physics
Goethe-University Frankfurt a.M.

Neural Networks

neurons in the brain

a brain full of neurons

binary information transmission

synapses are chemical

all constituent proteins recycled
(days, weeks), functional stationarity

artificial neurons

rate encoding

$\qquad\quad y_i = \sigma(x_i-b_i),\qquad\quad x_i=\sum_j w_{ij}y_j $

synaptic weights

linear classifier

unsupervised learning

what fires together,
wires together

Hebbian learning

$$ \frac{d}{dt} w_{ij} \sim y_i y_j $$

linear model

$\qquad\quad \fbox{$\phantom{\big|} \left\langle \frac{d}{dt} w_{ij}\right\rangle \sim \sum_k w_{ik}\big\langle y_k y_j\big\rangle \phantom{\big|}$}\,, \qquad\quad y_i\sim x_i = \sum_k w_{ik}y_k $ $$ S_{kj} = \big\langle (y_k-\overline{y}_k) (y_j-\overline{y}_j)\big\rangle = \big\langle y_k y_j\big\rangle, \qquad\quad \overline{y}_k\to0 $$

principal compoment analysis

competitive growth of components

$$ \hat{S} = \sum_{\{\lambda\}} \lambda \, \mathbf{e}_\lambda^{\phantom{T}} \mathbf{e}_\lambda^T, \qquad\quad \hat{S}\, \mathbf{e}_\gamma= \lambda\, \mathbf{e}_\gamma, \qquad\quad \mathbf{e}_\lambda\cdot\mathbf{e}_\gamma= \mathbf{e}_\lambda^T\mathbf{e}_\gamma^{\phantom{T}} = \delta_{\lambda,\gamma} \qquad\quad $$ $$ \tau_w\frac{d}{dt}\big( \hat{w}\cdot\mathbf{e}_\gamma^{\phantom{T}}\big) = \hat{w}\cdot\hat{S}\cdot\mathbf{e}_\gamma^{\phantom{T}} = \sum_{\{\lambda\}}\lambda\, \hat{w}\cdot \mathbf{e}_\lambda^{\phantom{T}} \underbrace{ \mathbf{e}_\lambda^T \cdot\mathbf{e}_\gamma^{\phantom{T}} }_{\delta_{\lambda,\gamma}}, \qquad\quad \fbox{$\phantom{\big|}\displaystyle \tau_w\frac{d}{dt}\big( \hat{w}\cdot\mathbf{e}_\gamma^{\phantom{T}}\big) = \gamma\,\big( \hat{w}\cdot\mathbf{e}_\gamma^{\phantom{T}}\big) \phantom{\big|}$} $$

neural networks




supervised learning

steepest descent

$$ \frac{d}{dt}w_{ij} \sim -\frac{\partial E}{\partial w_{ij}} = \left[ (\mathbf{y}_\alpha)_i-(\mathbf{y})_i \right]\,\sigma'(.) \,(\mathbf{I}_\alpha)_j $$

the XOR problem

the neural-network winter

universality of multilayer perceptrons

supperpositions of linear functions
are still linear
$\qquad\quad \begin{array}{rcl} y_5 &=& \sigma(w_{5,3}y_3+w_{5,4}y_4) \\ &=& \sigma\Big( w_{5,3}\sigma(w_{3,1}I_1+w_{3,2}I_2) \\ & & \phantom{\sigma}+ w_{5,4}\sigma(w_{4,1}I_1+w_{4,2}I_2) \Big) \end{array} $

linear neurons

non-linear neurons

given enough hidden layer neurons, non-linear
neurons can represent any smooth function

derivative of sigmoidal

$$\fbox{$\phantom{\big|}\displaystyle \sigma' = \sigma(1-\sigma) \phantom{\big|}$}\,, \qquad\quad \frac{d}{dx} y_i= y_i(1-y_i) $$


training multilayer perceptrons

$\qquad\quad \frac{\partial y_i}{\partial w_{\alpha\beta}} = y_i(1-y_i)\sum_j w_{ij} \frac{\partial y_j}{\partial w_{\alpha\beta}} $

recursive derivatives

$$ \frac{\partial E}{\partial w_{\alpha\beta}} = \sum_i\big(y_i-\tilde{y}_i\big)\, y_i(1-y_i)\sum_j w_{ij} \frac{\partial y_j}{\partial w_{\alpha\beta}} $$ $\quad\quad$ and hence $$ \frac{\partial E}{\partial w_{\alpha\beta}} = \sum_j \Delta E_j \frac{\partial y_j}{\partial w_{\alpha\beta}}, \quad\qquad \fbox{$\phantom{\big|}\displaystyle \Delta E_j = \sum_i\Delta E_i\, y_i(1-y_i)\, w_{ij} \phantom{\big|}$} $$
supervised learning via gradient descent is
equivalent to linear backpropagation of errors

long short-term memory

neurons with interal states

internal state manipulation