Machine Learning Primer -- Part I: Foundations

Claudius Gros, WS 2025/26

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

Machine Learning - Linear Classifier

linear classifier

A method using a linear combination of features
for the characterization or the separation of
two or more classes of objects or events.
The resulting combination may be used as a linear
classifier, or, more commonly, for dimensionality
reduction before later classification.

$\qquad\quad y= f(\mathbf{w}\cdot\mathbf{x}-b), \qquad\quad \mathbf{w}\cdot\mathbf{x}=\sum_{j=0}^{d-1} w_j x_j $

feature vector: $\ \ \mathbf{x}=(x_0,\,..,\,x_{d-1})$
weight vector: $\ \ \mathbf{w}=(w_0,\,..,\,w_{d-1})$
offset (threshold): $\ \ b$
(non-linear) transfer function: $\ \ f(z)$

hyperplane

$$ \mathbf{x}\qquad \mathrm{such\ that} \qquad \fbox{$\phantom{\big|}\mathbf{w}\cdot\mathbf{x}=b\phantom{\big|}$} $$

a $(d-1)$ dimensional hyperplane divides a $d$ dimensional space into two halves

$$ \begin{array}{rcl} \mathrm{right-side} &:& \mathbf{w}\cdot\mathbf{x}> b \\ \mathrm{left-side} &:& \mathbf{w}\cdot\mathbf{x}< b \end{array} $$

use transfer function $\ \ f(z)=y(z) \ \ $ to map to feature class

$$ y(z<0)\ \to\ -1,\quad\qquad y(z>0)\ \to\ 1,\quad\qquad z=\mathbf{w}\cdot\mathbf{x}-b $$

also called 'perceptron' $\ \ \to \ \ $ neural networks

task

determine "optimal"
weights $\ \ \mathbf{w}$
offset $\ \ b$

training / testing sets

supervised learning

labeled data: $\ \ N=N_1+N_2$

$ \qquad \begin{array}{rcl} \mathrm{first\ feature} &:& \{\mathbf{x}_\alpha\}, \qquad \alpha\in[1,N_1] \\ \mathrm{second\ feature} &:& \{\mathbf{x}_\beta\}, \qquad \beta\in[N_1+1,N] \end{array} $

divide randomly into training / test data
: typically 70/30, 80/20, ...
non-optimized model parameters (hyper parameters)
may be fine tuned using a validation data set, if needed

center of mass of individual classes

$$ \mathbf{m}_1 = \frac{1}{N_1}\sum_\alpha \mathbf{x}_\alpha, \quad\qquad \mathbf{m}_2 = \frac{1}{N_2}\sum_\beta \mathbf{x}_\beta $$

implicit: sum is over training set
intuition: $\ \ \mathbf{w}\ \propto \ \mathbf{m}_1-\mathbf{m}_2$
: only true for uniformly distributed data

covariance matrix

probability distribution

$\qquad\quad p(\mathbf{x})\ge0, \qquad\qquad \int p(\mathbf{x})\,d\mathbf{x}=1 $

generating the data

$ p(\mathbf{x}) \quad\longrightarrow \quad \mathbf{x}_\alpha, $
with mean (generic)

$ \mathbf{m} = \frac{1}{N}\sum_\alpha \mathbf{x}_\alpha = \int \mathbf{x}\,p(\mathbf{x})\,d\mathbf{x} = \langle \mathbf{x}\rangle $
Cartesian components $$ \mathbf{x}=\left(\begin{array}{c} x_0\\ \vdots \\ x_{d-1}\end{array}\right),\qquad\quad \mathbf{m}=\left(\begin{array}{c} m_0\\ \vdots \\ m_{d-1}\end{array}\right),\qquad\quad \mathbf{x}^T=(x_0,\,\dots,x_{d-1}) $$ note: distinguish here between and vector $\ \ \mathbf{x}\ \ $ and transposed vector $\ \ \mathbf{x}^T\ \ $

covariance matrix

$$ S_{ij} = \big\langle (x_i-m_i)(x_j-m_j)\big\rangle $$

symmetric
: eigenvalues are real
: they correspond to half-width of covariance ellipse
matrix notation

$$ \hat S = \big\langle (\mathbf{x}-\mathbf{m})\,(\mathbf{x}-\mathbf{m})^T\big\rangle, \qquad\quad S_{ij} = (\hat{S})_{ij}, \qquad\quad \mathbf{x}=\left(\begin{array}{c} x_0\\ \vdots \\ x_{d-1} \end{array}\right) $$

outer product $$ \mathbf{x}\,\mathbf{x}^T= \mathbf{x}\otimes\mathbf{x}, \qquad\quad \big(\mathbf{x}\,\mathbf{x}^T\big)_{ij}=x_ix_j $$ note $$ \mathbf{x}^T\mathbf{x} = \mathbf{x}\cdot\mathbf{x} = \sum_i x_i x_i $$

covariance ellipse

parametric ellipse

$$ \frac{x^2}{a^2}+\frac{y^2}{b^2} = 1, \quad\qquad \left\{\begin{array}{rcl} x(t) &=& \sqrt{\lambda_1}\cos\Theta\cos t -\sqrt{\lambda_2}\sin\Theta\sin t \\[0.1ex] y(t) &=& \sqrt{\lambda_1}\sin\Theta\cos t +\sqrt{\lambda_2}\cos\Theta\sin t \end{array}\right. $$

angle with x-axis: $\ \Theta$
eigenvalues $\ \lambda_i$

$$ \mathbf{x} = \hat{V} \cdot \hat{\Lambda} \cdot \mathbf{T}, \quad\qquad \mathbf{x} = {x\choose y}, \quad\qquad \mathbf{T} = {\cos t\choose \sin t}, $$

eigen-vector matrix $\ \hat{V}$

$$ \hat{V}=\left(\begin{array}{cc} \cos\Theta & -\sin\Theta \\ \sin\Theta & \cos\Theta \end{array}\right) \quad\qquad \hat{\Lambda}=\left(\begin{array}{cc} \sqrt{\lambda_1} & 0 \\ 0 & \sqrt{\lambda_2}, \end{array}\right) $$

plotting a covariance ellipse

if (__name__ == "__main__"):
not executed when imported as a module
allows to include testing
$A=\sum_i\lambda_i |\lambda_i\rangle\langle\lambda_i|$
matrix decomposition in terms of eigensystem
projection operators $|\lambda\rangle\langle\lambda|$ are outer products
np.random.multivariate_normal(mean, coVar, N)
draws N samples from a
multi-variate Gaussian with covariance matrix coVar
two covariance matrices:
for data generation and measured

#!/usr/bin/env python3
import math                      # math
import numpy as np               # numerics
import matplotlib.pyplot as plt  # plotting
from numpy import linalg as LA   # linear algebra

# ***
# *** covariance matrix from data
# ***

def covarianceMatrix(data):
  '''normalized covariance matrix of input data'''
  nRow = len(data)                     # number of data points
  nCol = len(data[0])                  # dimension
  mean = [0.0 for iCol in range(nCol)]
  for thisRow in data:                 # loop over data points
    mean += thisRow*1.0/nRow           # vector-mean
#
  coVar = np.zeros((nCol,nCol))        # empty matrix
  for thisRow in data:
    dataPoint = thisRow - mean         # shifted data point
    coVar += np.outer(dataPoint,dataPoint)/nRow    # outer product
#
  if (1==2):
    print("\ninput data\n", data)
    print("nRow, nCol\n", nRow, nCol)
    print("mean\n", mean)
  print("\n covariance matrix: measured \n",coVar)
#
  return mean, coVar                   # returning both

# ***
# *** generate data for plotting an ellipse
# ***

def plotEllipseData(data):
  "generates ellipse from symmetric 2x2 slice of the input matrix"
#
  slice22 = [inner[:2] for inner in data[:2]]   # slice
  slice22[0][1] = slice22[1][0]                 # symmetrize
                                                # ^ should not be necessary
  eigenValues, eigenVectors = LA.eig(slice22)   # eigensystem
#
  if (eigenValues[0]<0.0) or (eigenValues[1]<0.0):
    print("# plotEllipseData: only positive eigenvalues (variance)")
    return
#
  if (1==2):
    print("\n# coVar     \n", coVar  )
    print("\n# slice22     ", slice22)
    print("\n# eigenValues ", eigenValues)
    print("\n# eigenVectors\n",eigenVectors)
#
  a = math.sqrt(eigenValues[0])
  b = math.sqrt(eigenValues[1])
  cTheta = eigenVectors[0][0]
  sTheta = eigenVectors[1][0]
  x = []
  y = []
  for i in range(nPoints:=101):          # walrus assignment
    tt = i*2.0*math.pi/(nPoints-1)       # full loop
    cc = math.cos(tt)
    ss = math.sin(tt)
    xx = a*cTheta*cc - b*sTheta*ss
    yy = a*sTheta*cc + b*cTheta*ss
    x.append(xx)
    y.append(yy)
#   print(xx,yy)
  return x, y

# ***
# *** generate 2D test data
# ***

def testData(angle, var1, var2, nData, startMean=[0.0,0.0]):
  "2D Gaussian for a given angle and main variances.\
   A = \sum_i \lambda_i |lambda_i><lambda_i|"
#
  eigen1 = [math.cos(angle),-math.sin(angle)]
  eigen2 = [math.sin(angle), math.cos(angle)]
  startCoVar  = var1*np.outer(eigen1,eigen1)
  startCoVar += var2*np.outer(eigen2,eigen2)
  print("\n covariance matrix: data generation \n",startCoVar)
  return np.random.multivariate_normal(startMean, startCoVar, nData)

# ***
# *** main (with a block)
# ***

if (__name__ == "__main__"):                 # blocking out-of-use access
#
  dataMatrix = testData(0.3*math.pi, 1.0, 9.0, 100)
  mean, coVar = covarianceMatrix(dataMatrix)

  if (1==2):
    print("\n#main: data\n", dataMatrix)
    print("#main: mean\n", mean)
    print("\n#main: coVar matrix \n",coVar)

  xEllipse, yEllipse = plotEllipseData(coVar)  # coVar-ellipse

  xData = [thisRow[0] for thisRow in dataMatrix]
  yData = [thisRow[1] for thisRow in dataMatrix]

  Z_90 = math.sqrt(4.601)                      # 90% confidence
  Z_95 = math.sqrt(5.991)                      # 95% confidence
  Z_99 = math.sqrt(9.210)                      # 99% confidence
#
  xE_90 = [Z_90*xx + mean[0] for xx in xEllipse]
  yE_90 = [Z_90*yy + mean[1] for yy in yEllipse]
  xE_99 = [Z_99*xx + mean[0] for xx in xEllipse]
  yE_99 = [Z_99*yy + mean[1] for yy in yEllipse]
#
  plt.plot(xE_90, yE_90, label="90%")
  plt.plot(xE_99, yE_99, label="99%")
  plt.plot(xData, yData, "ob", markersize=5)
#
  plt.legend(loc="upper left")
  plt.axis('square')                           # square plot
  plt.savefig('foo.svg')                       # export figure
  plt.show()

#!/usr/bin/env python3
import math                      # math
import numpy as np               # numerics
import matplotlib.pyplot as plt  # plotting
from numpy import linalg as LA   # linear algebra

# ***
# *** covariance matrix from data
# ***

def covarianceMatrix(data):
  '''normalized covariance matrix of input data'''
  nRow = len(data)                     # number of data points
  nCol = len(data[0])                  # dimension
  mean = [0.0 for iCol in range(nCol)]
  for thisRow in data:                 # loop over data points
    mean += thisRow*1.0/nRow           # vector-mean
#
  coVar = np.zeros((nCol,nCol))        # empty matrix
  for thisRow in data:
    dataPoint = thisRow - mean         # shifted data point
    coVar += np.outer(dataPoint,dataPoint)/nRow    # outer product
#
  if (1==2):
    print("\ninput data\n", data)
    print("nRow, nCol\n", nRow, nCol)
    print("mean\n", mean)
  print("\n covariance matrix: measured \n",coVar)
#
  return mean, coVar                   # returning both

# ***
# *** generate data for plotting an ellipse
# ***

def plotEllipseData(data):
  "generates ellipse from symmetric 2x2 slice of the input matrix"
#
  slice22 = [inner[:2] for inner in data[:2]]   # slice
  slice22[0][1] = slice22[1][0]                 # symmetrize
                                                # ^ should not be necessary
  eigenValues, eigenVectors = LA.eig(slice22)   # eigensystem
#
  if (eigenValues[0]<0.0) or (eigenValues[1]<0.0):
    print("# plotEllipseData: only positive eigenvalues (variance)")
    return
#
  if (1==2):
    print("\n# coVar     \n", coVar  )
    print("\n# slice22     ", slice22)
    print("\n# eigenValues ", eigenValues)
    print("\n# eigenVectors\n",eigenVectors)
#
  a = math.sqrt(eigenValues[0])
  b = math.sqrt(eigenValues[1])
  cTheta = eigenVectors[0][0]
  sTheta = eigenVectors[1][0]
  x = []
  y = []
  for i in range(nPoints:=101):          # walrus assignment
    tt = i*2.0*math.pi/(nPoints-1)       # full loop
    cc = math.cos(tt)
    ss = math.sin(tt)
    xx = a*cTheta*cc - b*sTheta*ss
    yy = a*sTheta*cc + b*cTheta*ss
    x.append(xx)
    y.append(yy)
#   print(xx,yy)
  return x, y

# ***
# *** generate 2D test data
# ***

def testData(angle, var1, var2, nData, startMean=[0.0,0.0]):
  "2D Gaussian for a given angle and main variances.\
   A = \sum_i \lambda_i |lambda_i><lambda_i|"
#
  eigen1 = [math.cos(angle),-math.sin(angle)]
  eigen2 = [math.sin(angle), math.cos(angle)]
  startCoVar  = var1*np.outer(eigen1,eigen1)
  startCoVar += var2*np.outer(eigen2,eigen2)
  print("\n covariance matrix: data generation \n",startCoVar)
  return np.random.multivariate_normal(startMean, startCoVar, nData)

# ***
# *** main (with a block)
# ***

if (__name__ == "__main__"):                 # blocking out-of-use access
#
  dataMatrix = testData(0.3*math.pi, 1.0, 9.0, 100)
  mean, coVar = covarianceMatrix(dataMatrix)

if (1==2):
    print("\n#main: data\n", dataMatrix)
    print("#main: mean\n", mean)
    print("\n#main: coVar matrix \n",coVar)

xEllipse, yEllipse = plotEllipseData(coVar)  # coVar-ellipse

xData = [thisRow[0] for thisRow in dataMatrix]
  yData = [thisRow[1] for thisRow in dataMatrix]

Z_90 = math.sqrt(4.601)                      # 90% confidence
  Z_95 = math.sqrt(5.991)                      # 95% confidence
  Z_99 = math.sqrt(9.210)                      # 99% confidence
#
  xE_90 = [Z_90*xx + mean[0] for xx in xEllipse]
  yE_90 = [Z_90*yy + mean[1] for yy in yEllipse]
  xE_99 = [Z_99*xx + mean[0] for xx in xEllipse]
  yE_99 = [Z_99*yy + mean[1] for yy in yEllipse]
#
  plt.plot(xE_90, yE_90, label="90%")
  plt.plot(xE_99, yE_99, label="99%")
  plt.plot(xData, yData, "ob", markersize=5)
#
  plt.legend(loc="upper left")
  plt.axis('square')                           # square plot
  plt.savefig('foo.svg')                       # export figure
  plt.show()

python intermezzo

loading a privat module
:: plotting the covariance ellipse
__pycache__
:: directory created when importing private files
imports are not transferred

#!/usr/bin/env python3

import math                        # math
import matplotlib.pyplot as plt    # plotting
import ML_covarianceMatrix as CM   # loading user-defined module


dataMatrix  = CM.testData(0.3*math.pi, 1.0, 9.0, 100)
mean, coVar = CM.covarianceMatrix(dataMatrix)

xEllipse, yEllipse = CM.plotEllipseData(coVar)  # coVar-ellipse

xData = [thisRow[0] for thisRow in dataMatrix]
yData = [thisRow[1] for thisRow in dataMatrix]

Z_90 = math.sqrt(4.601)                      # 90% confidence
Z_95 = math.sqrt(5.991)                      # 95% confidence
Z_99 = math.sqrt(9.210)                      # 99% confidence

xE_90 = [Z_90*xx + mean[0] for xx in xEllipse]
yE_90 = [Z_90*yy + mean[1] for yy in yEllipse]
xE_99 = [Z_99*xx + mean[0] for xx in xEllipse]
yE_99 = [Z_99*yy + mean[1] for yy in yEllipse]

plt.plot(xE_90, yE_90, label="90%")
plt.plot(xE_99, yE_99, label="99%")
plt.plot(xData, yData, "ob", markersize=5)

plt.legend(loc="upper left")
plt.axis('square')                           # square plot
plt.savefig('foo.svg')                       # export figure
plt.show()

#!/usr/bin/env python3

import math                        # math
import matplotlib.pyplot as plt    # plotting
import ML_covarianceMatrix as CM   # loading user-defined module

dataMatrix  = CM.testData(0.3*math.pi, 1.0, 9.0, 100)
mean, coVar = CM.covarianceMatrix(dataMatrix)

xEllipse, yEllipse = CM.plotEllipseData(coVar)  # coVar-ellipse

xData = [thisRow[0] for thisRow in dataMatrix]
yData = [thisRow[1] for thisRow in dataMatrix]

Z_90 = math.sqrt(4.601)                      # 90% confidence
Z_95 = math.sqrt(5.991)                      # 95% confidence
Z_99 = math.sqrt(9.210)                      # 99% confidence

xE_90 = [Z_90*xx + mean[0] for xx in xEllipse]
yE_90 = [Z_90*yy + mean[1] for yy in yEllipse]
xE_99 = [Z_99*xx + mean[0] for xx in xEllipse]
yE_99 = [Z_99*yy + mean[1] for yy in yEllipse]

plt.plot(xE_90, yE_90, label="90%")
plt.plot(xE_99, yE_99, label="99%")
plt.plot(xData, yData, "ob", markersize=5)

plt.legend(loc="upper left")
plt.axis('square')                           # square plot
plt.savefig('foo.svg')                       # export figure
plt.show()

multivariate Gaussians

uni/multi-variate : one/multi-dimenional
one-dimensional Gaussian $$ N(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)} $$
multivariate $$ N(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^d|\hat{S}|}} \exp\left(\frac{-1}{2}({\mathbf x}-{\mathbf m})^T{\hat{S}}^{-1} ({\mathbf x}-{\mathbf m})\right) $$

diagonal case

$$ \hat{S}=\left(\begin{array}{ccc} \sigma_0^2 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \sigma_{d-1}^2 \end{array}\right), \qquad\quad \sigma_k^2 = \big\langle (x_k-m_k)^2\big\rangle $$

if $\ \ \big\langle (x_i-m_i)(x_j-m_j)\big\rangle=0\quad \mbox{for}\quad i\ne j$
$i\ne j\ \ $ uncorrelated

$$ |\hat{S}| = \prod_k \sigma_k^2, \qquad\quad \hat{S}^{-1}=\left(\begin{array}{ccc} 1/\sigma_0^2 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & 1/\sigma_{d-1}^2 \end{array}\right) $$

multivariate Gaussian factorizes

$$ N(\mathbf{x}) = \prod_k \frac{\exp(-(x_k-m_k)^2/(2\sigma_k^2))}{\sqrt{2\pi\sigma_k^2}} $$

statistics of features

a linear classifiers combines data of features via $\ \ \mathbf{w}\cdot\mathbf{x}_\alpha,\qquad \alpha=1,2$
mean $\ \ \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha$
variance $$ \sigma_\alpha^2=\left\langle (\mathbf{w}\cdot\mathbf{x}_\alpha -\mathbf{w}\cdot\mathbf{m}_\alpha)^2 \right\rangle = \left\langle (\mathbf{w}\cdot(\mathbf{x}_\alpha -\mathbf{m}_\alpha))^2 \right\rangle = \left\langle (\mathbf{w}^T(\mathbf{x}_\alpha -\mathbf{m}_\alpha))^2 \right\rangle $$ or $$ \sigma_\alpha^2=\left\langle \mathbf{w}^T (\mathbf{x}_\alpha -\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha -\mathbf{m}_\alpha)^T \mathbf{w} \right\rangle = \mathbf{w}^T \left\langle (\mathbf{x}_\alpha -\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha -\mathbf{m}_\alpha)^T \right\rangle \mathbf{w} $$

in terms of the covariance matrix

$$ \fbox{$\phantom{\big|}\sigma_\alpha^2= \mathbf{w}^T \hat S_\alpha \mathbf{w}\phantom{\big|}$} \qquad\quad \hat S_\alpha = \big\langle (\mathbf{x}_\alpha-\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha-\mathbf{m}_\alpha)^T\big\rangle $$

linear discriminant analysis (LDA)

linear discriminant, mean, variance $$ \mathbf{w}\cdot\mathbf{x}_\alpha,\qquad\quad \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha,\qquad\quad \sigma_\alpha^2=\mathbf{w}^T \hat S_\alpha \mathbf{w} $$
many linear classifier exist

Fisher objective function

maximize $\ \ (\mu_1-\mu_2)^2\ \ $
keeping the combined variance $\ \ \sigma_1^2+\sigma_2^2\ \ $ constant

task

find feature vector $\ \ \mathbf{w}\ \ $ maximizing

$$ S = \frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2} = \frac{(\mathbf{w}\cdot(\mathbf{m}_1-\mathbf{m}_2))^2} {\mathbf{w}^T (\hat{S}_1+\hat{S}_2) \mathbf{w}} = \frac{\mathbf{w}^T(\mathbf{m}_1-\mathbf{m}_2) (\mathbf{m}_1-\mathbf{m}_2)^T\mathbf{w}} {\mathbf{w}^T (\hat{S}_1+\hat{S}_2) \mathbf{w}} $$

generic

$$ S = \frac{\mathbf{w}^T\hat{A}\mathbf{w}} {\mathbf{w}^T \hat{B}\mathbf{w}} \qquad\quad \hat{A}=(\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T \qquad\quad \hat{B}=\hat{S}_1+\hat{S}_2 $$

quantum mechanics:
$\mathbf{w}\ \ $ wavefunction
$\hat{A}\ \ $ Hamilton operator
$\hat{B}\ \ $ identiy matrix
$S\ \ $ expectation value of energy

LDA - maximization

$$ S = \frac{\mathbf{w}^T\hat{A}\mathbf{w}} {\mathbf{w}^T \hat{B}\mathbf{w}} \qquad\quad \hat{A}=(\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T \qquad\quad \hat{B}=\hat{S}_1+\hat{S}_2 $$

variation $$ \delta S= \frac{\mathbf{w}^T\hat{A}\,\delta\mathbf{w}}{\mathbf{w}^T \hat{B}\mathbf{w}} - \frac{\mathbf{w}^T\hat{A}\mathbf{w}}{(\mathbf{w}^T \hat{B}\mathbf{w})^2} \mathbf{w}^T \hat{B}\,\delta\mathbf{w} $$ plus terms in $\ \ (\delta\mathbf{w})^T$
stationarity $$ \delta S=0\qquad\quad \big(\mathbf{w}^T \hat{B}\mathbf{w}\big)\big(\mathbf{w}^T\hat{A}\big)= \big(\mathbf{w}^T \hat{A}\mathbf{w}\big)\big(\mathbf{w}^T\hat{B}\big) $$

proportionality factors

$$ \mathbf{w}^T\hat{A}=\mathbf{w}^T \big((\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T\big) \propto (\mathbf{m}_1-\mathbf{m}_2)^T $$

scalar quantities $$ \mathbf{w}^T(\mathbf{m}_1-\mathbf{m}_2)\qquad\quad \mathbf{w}^T \hat{A}\mathbf{w}\qquad\quad \mathbf{w}^T \hat{B}\mathbf{w} $$
proportionality $$ (\mathbf{m}_1-\mathbf{m}_2)^T\propto \mathbf{w}^T\hat{B} \qquad\quad \mathbf{m}_1-\mathbf{m}_2\propto \hat{B}^T \mathbf{w}= \hat{B} \mathbf{w} \qquad\quad \hat{B}^T = \hat{B} $$
the solution for the feature vector $$ \fbox{$\phantom{\big|} \mathbf{w}\propto \big(\hat{S}_1+\hat{S}_2\big)^{-1} \big(\mathbf{m}_1-\mathbf{m}_2\big) \phantom{\big|}$} \qquad \hat{B}=\hat{S}_1+\hat{S}_2 $$ may be normalized to unity

regularization

the system of linear equation $$ \big(\hat{S}_1+\hat{S}_2\big)\mathbf{w}= \mathbf{m}_1-\mathbf{m}_2 \qquad\quad \hat{S}_1+\hat{S}_2\ \to\ (1-\lambda)\big(\hat{S}_1+\hat{S}_2\big) +\lambda \hat{I} $$ may be regularized for a singular combined covariance matrix $\ \ \hat{S}_1+\hat{S}_2$
shrinking intensity (regularization parameter): $\ \ \lambda$
identity matrix: $\ \ \hat{I}$

LDA threshold

already determined: $\ \ \mathbf{w}$ $$ z_\alpha=\mathbf{w}\cdot\mathbf{x}_\alpha\qquad\quad \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha\qquad\quad \sigma_\alpha^2=\mathbf{w}^T \hat S_\alpha \mathbf{w}\qquad\quad |\mathbf{w}|=1 $$
determine now threshold $\ \ b:\quad\quad \mathbf{w}\cdot\mathbf{x}=b$

miss-classifications

case $\ \ \mu_1<\mu_2$

$$ E= \int_{-\infty}^b p_2(z)\,dz + \int_b^{\infty} p_1(z)\,dz $$

error mimiziation

$$ \frac{\partial E}{\partial b} = 0 \qquad\quad \fbox{$\phantom{\big|} p_1(b) = p_2(b)\phantom{\big|}$} $$

principle of equal probabilities
threshold close to $\ \ (\mu_1+\mu_2)/2$
otherwise: Gaussian approximation

basic LDA code

solving
$\displaystyle\qquad\qquad \big(\hat{S}_1+\hat{S}_2\big)\mathbf{w}= \mathbf{m}_1-\mathbf{m}_2 $
numpy.linalg.inv
:: matrix inversion
numpy.matmul
:: matrix multiplication
np.sqrt(np.sum(vector**2))
:: vector norm

#!/usr/bin/env python3

import math                        # math
import matplotlib.pyplot as plt    # plotting
import numpy as np 
from numpy.linalg import inv       # inverse matrix
import ML_covarianceMatrix as CM   # loading user-defined module


#
# --- data generation
#

dataMatrix_A  = CM.testData( 0.4*math.pi, 1.0, 9.0, 30, [ 6.0,0.0])
dataMatrix_B  = CM.testData(-0.4*math.pi, 1.0, 9.0, 30, [-6.0,0.0])
mean_A, coVar_A = CM.covarianceMatrix(dataMatrix_A)  
mean_B, coVar_B = CM.covarianceMatrix(dataMatrix_B)  

#
# --- LDA
#

S_inverse    = inv(coVar_A+coVar_B)                     # build in
w_vector     = np.matmul(S_inverse,mean_B-mean_A)       # LDA

w_normalized = w_vector / np.sqrt(np.sum(w_vector**2))  # normalization
w_orthogonal = [w_normalized[1], -w_normalized[0]]      # \perp vector

LL = 5.0
midPoint = 0.5*(mean_B+mean_A)
x_lower  = midPoint[0] - LL*w_orthogonal[0]
y_lower  = midPoint[1] - LL*w_orthogonal[1]
x_upper  = midPoint[0] + LL*w_orthogonal[0]
y_upper  = midPoint[1] + LL*w_orthogonal[1]
x_LDA_plane = [x_lower, x_upper]
y_LDA_plane = [y_lower, y_upper]

if (1==2):
  print("\n#main w_orthogonal\n", w_orthogonal)
  print("\n#main midPoint    \n", midPoint    )

#
# --- printing
#

xEllipse_A, yEllipse_A = CM.plotEllipseData(coVar_A)  # coVar-ellipse
xEllipse_B, yEllipse_B = CM.plotEllipseData(coVar_B)  # coVar-ellipse

xData_A = [thisRow[0] for thisRow in dataMatrix_A]
yData_A = [thisRow[1] for thisRow in dataMatrix_A]
xData_B = [thisRow[0] for thisRow in dataMatrix_B]
yData_B = [thisRow[1] for thisRow in dataMatrix_B]

Z_95 = math.sqrt(5.991)                      # 95% confidence  
 
xE_95_A = [Z_95*xx + mean_A[0] for xx in xEllipse_A]
yE_95_A = [Z_95*yy + mean_A[1] for yy in yEllipse_A]
xE_95_B = [Z_95*xx + mean_B[0] for xx in xEllipse_B]
yE_95_B = [Z_95*yy + mean_B[1] for yy in yEllipse_B]

x_connectMean = [ mean_A[0], mean_B[0] ]
y_connectMean = [ mean_A[1], mean_B[1] ]
 
plt.plot(xE_95_A, yE_95_A, "k", linewidth=2.0, label="95%")
plt.plot(xE_95_B, yE_95_B, "b", linewidth=2.0, label="95%")
plt.plot(xData_A, yData_A, "ok", markersize=5)
plt.plot(xData_B, yData_B, "ob", markersize=5)

plt.plot(x_connectMean, y_connectMean, "r", linewidth=3.0)
plt.plot(x_connectMean, y_connectMean, "or", markersize=9.0)

plt.plot(x_LDA_plane, y_LDA_plane, "g", linewidth=4.0, label="LDA plane")
 
plt.legend(loc="upper left")
#plt.axis('square')                           # square plot
plt.savefig('foo.svg')                       # export figure
plt.show()

#!/usr/bin/env python3

import math                        # math
import matplotlib.pyplot as plt    # plotting
import numpy as np 
from numpy.linalg import inv       # inverse matrix
import ML_covarianceMatrix as CM   # loading user-defined module

#
# --- data generation
#

dataMatrix_A  = CM.testData( 0.4*math.pi, 1.0, 9.0, 30, [ 6.0,0.0])
dataMatrix_B  = CM.testData(-0.4*math.pi, 1.0, 9.0, 30, [-6.0,0.0])
mean_A, coVar_A = CM.covarianceMatrix(dataMatrix_A)  
mean_B, coVar_B = CM.covarianceMatrix(dataMatrix_B)

#
# --- LDA
#

S_inverse    = inv(coVar_A+coVar_B)                     # build in
w_vector     = np.matmul(S_inverse,mean_B-mean_A)       # LDA

w_normalized = w_vector / np.sqrt(np.sum(w_vector**2))  # normalization
w_orthogonal = [w_normalized[1], -w_normalized[0]]      # \perp vector

LL = 5.0
midPoint = 0.5*(mean_B+mean_A)
x_lower  = midPoint[0] - LL*w_orthogonal[0]
y_lower  = midPoint[1] - LL*w_orthogonal[1]
x_upper  = midPoint[0] + LL*w_orthogonal[0]
y_upper  = midPoint[1] + LL*w_orthogonal[1]
x_LDA_plane = [x_lower, x_upper]
y_LDA_plane = [y_lower, y_upper]

if (1==2):
  print("\n#main w_orthogonal\n", w_orthogonal)
  print("\n#main midPoint    \n", midPoint    )

#
# --- printing
#

xEllipse_A, yEllipse_A = CM.plotEllipseData(coVar_A)  # coVar-ellipse
xEllipse_B, yEllipse_B = CM.plotEllipseData(coVar_B)  # coVar-ellipse

xData_A = [thisRow[0] for thisRow in dataMatrix_A]
yData_A = [thisRow[1] for thisRow in dataMatrix_A]
xData_B = [thisRow[0] for thisRow in dataMatrix_B]
yData_B = [thisRow[1] for thisRow in dataMatrix_B]

Z_95 = math.sqrt(5.991)                      # 95% confidence  
 
xE_95_A = [Z_95*xx + mean_A[0] for xx in xEllipse_A]
yE_95_A = [Z_95*yy + mean_A[1] for yy in yEllipse_A]
xE_95_B = [Z_95*xx + mean_B[0] for xx in xEllipse_B]
yE_95_B = [Z_95*yy + mean_B[1] for yy in yEllipse_B]

x_connectMean = [ mean_A[0], mean_B[0] ]
y_connectMean = [ mean_A[1], mean_B[1] ]
 
plt.plot(xE_95_A, yE_95_A, "k", linewidth=2.0, label="95%")
plt.plot(xE_95_B, yE_95_B, "b", linewidth=2.0, label="95%")
plt.plot(xData_A, yData_A, "ok", markersize=5)
plt.plot(xData_B, yData_B, "ob", markersize=5)

plt.plot(x_connectMean, y_connectMean, "r", linewidth=3.0)
plt.plot(x_connectMean, y_connectMean, "or", markersize=9.0)

plt.plot(x_LDA_plane, y_LDA_plane, "g", linewidth=4.0, label="LDA plane")
 
plt.legend(loc="upper left")
#plt.axis('square')                           # square plot
plt.savefig('foo.svg')                       # export figure
plt.show()