Advanced Introduction to C++, Scientific Computing and Machine Learning

Claudius Gros, WS 2019/20

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

machine learning vs. AI

• ML (machine learning) is considered a subset of AI (artififical intelligence)
• algorithms and programs
• model & generalize data
• optimize utility functions
• Chess, Go, autonomous driving, ...
• human level performance possible
• AI: "intelligent stuff"
• not well defined
• machine learning plus non learning procedures
• human level AI
• not yet possible
• ML plus unknown principles

what is intelligence?

• on an operational level intelligence is a measure of the
number of behavioral options a cognitive system provides

learning

• modifying a behavior based on experience

sample problem settings

unsupervised learning

• learn a new representation of the data

clustering

• group similar data points together

dimensionality reduction

• find a lower dimensional (simpler) representation

supervised learning

• make predictions

classification

• make discrete predictions

regression

• make continuous predictions

linear classifier

• A method using a linear combination of features
for the characterization or the separation of
two or more classes of objects or events.
• The resulting combination may be used as a linear
classifier, or, more commonly, for dimensionality
reduction before later classification.

$\displaystyle\qquad\quad y= f(\mathbf{w}\cdot\mathbf{x}-b), \qquad\quad \mathbf{w}\cdot\mathbf{x}=\sum_{j=0}^{d-1} w_j x_j$
• feature vector: $\ \ \mathbf{x}=(x_0,\,..,\,x_{d-1})$
weight vector: $\ \ \mathbf{w}=(w_0,\,..,\,w_{d-1})$
offset (threshold): $\ \ b$
• (non-linear) transfer function: $\ \ f(z)$

hyperplane

$$\mathbf{x}\qquad \mathrm{such\ that} \qquad \fbox{\phantom{\big|}\mathbf{w}\cdot\mathbf{x}=b\phantom{\big|}}$$
• a $(d-1)$ dimensional hyperplane divides a $d$ dimensional space into two halves
$$\begin{array}{rcl} \mathrm{right-side} &:& \mathbf{w}\cdot\mathbf{x}> b \\ \mathrm{left-side} &:& \mathbf{w}\cdot\mathbf{x}< b \end{array}$$
• use transfer function $\ \ f(z)=y(z) \ \$ to map to feature class
$$y(z<0)\ \to\ -1,\quad\qquad y(z>0)\ \to\ 1,\quad\qquad z=\mathbf{w}\cdot\mathbf{x}-b$$
• also called 'perceptron' $\ \ \to \ \$ neural networks

• determine "optimal"
weights $\ \ \mathbf{w}$
offset $\ \ b$

training / testing sets

supervised learning

• labeled data: $\ \ N=N_1+N_2$

$\displaystyle \qquad \begin{array}{rcl} \mathrm{first\ feature} &:& \{\mathbf{x}_\alpha\}, \qquad \alpha\in[1,N_1] \\ \mathrm{second\ feature} &:& \{\mathbf{x}_\beta\}, \qquad \beta\in[N_1+1,N] \end{array}$

• divide randomly into training / test data
: typically 70/30, 80/20, ...
• non-optimized model parameters (hyper parameters)
may be fine tuned using a validation data set, if needed

center of mass of individual classes

$$\mathbf{m}_1 = \frac{1}{N_1}\sum_\alpha \mathbf{x}_\alpha, \quad\qquad \mathbf{m}_2 = \frac{1}{N_2}\sum_\beta \mathbf{x}_\beta$$
• implicit: sum is over training set
• intuition: $\ \ \mathbf{w}\ \propto \ \mathbf{m}_1-\mathbf{m}_2$
: only true for uniformly distributed data

information theory - covariance matrix

probability distribution

$\displaystyle\qquad\quad p(\mathbf{x})\ge0, \qquad\qquad \int p(\mathbf{x})\,d\mathbf{x}=1$
• generating the data
$\displaystyle p(\mathbf{x}) \quad\longrightarrow \quad \mathbf{x}_\alpha,$
with mean (generic)
$\displaystyle \mathbf{m} = \frac{1}{N}\sum_\alpha \mathbf{x}_\alpha = \int \mathbf{x}\,p(\mathbf{x})\,d\mathbf{x} = \langle \mathbf{x}\rangle$
• Cartesian components $$\mathbf{x}=\left(\begin{array}{c} x_0\\ \vdots \\ x_{d-1}\end{array}\right),\qquad\quad \mathbf{m}=\left(\begin{array}{c} m_0\\ \vdots \\ m_{d-1}\end{array}\right),\qquad\quad \mathbf{x}^T=(x_0,\,\dots,x_{d-1})$$ note: distinguish here between and vector $\ \ \mathbf{x}\ \$ and transposed vector $\ \ \mathbf{x}^T\ \$

covariance matrix

$$S_{ij} = \big\langle (x_i-m_i)(x_j-m_j)\big\rangle$$
• symmetric
: eigenvalues are real
: they correspond to half-width of covariance ellipse
• matrix notation
$$\hat S = \big\langle (\mathbf{x}-\mathbf{m})\,(\mathbf{x}-\mathbf{m})^T\big\rangle, \qquad\quad S_{ij} = (\hat{S})_{ij}, \qquad\quad \mathbf{x}=\left(\begin{array}{c} x_0\\ \vdots \\ x_{d-1} \end{array}\right)$$
• external product $$\mathbf{x}\,\mathbf{x}^T= \mathbf{x}\otimes\mathbf{x}, \qquad\quad \big(\mathbf{x}\,\mathbf{x}^T\big)_{ij}=x_ix_j$$ note $$\mathbf{x}^T\mathbf{x} = \mathbf{x}\cdot\mathbf{x} = \sum_i x_i x_i$$

information theory - multivariate Gaussian

• uni/multi-variate : one/multi-dimenional
• one-dimensional Gaussian $$N(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}$$
• multivariate $$N(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^d|\hat{S}|}} \exp\left(\frac{-1}{2}({\mathbf x}-{\mathbf m})^T{\hat{S}}^{-1} ({\mathbf x}-{\mathbf m})\right)$$

diagonal case

$$\hat{S}=\left(\begin{array}{ccc} \sigma_0^2 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \sigma_{d-1}^2 \end{array}\right), \qquad\quad \sigma_k^2 = \big\langle (x_i-m_i)^2\big\rangle$$
• if $\ \ \big\langle (x_i-m_i)(x_j-m_j)\big\rangle=0\quad \mbox{for}\quad i\ne j$
$i\ne j\ \$ uncorrelated
$$|\hat{S}| = \prod_k \sigma_k^2, \qquad\quad \hat{S}^{-1}=\left(\begin{array}{ccc} 1/\sigma_0^2 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & 1/\sigma_{d-1}^2 \end{array}\right)$$
• multivariate Gaussian factorizes
$$N(\mathbf{x}) = \prod_k \frac{\exp(-(x_k-m_k)^2/(2\sigma_k^2))}{\sqrt{2\pi\sigma_k^2}}$$

statistics of feature

• a linear classifiers combines data of features via $\ \ \mathbf{w}\cdot\mathbf{x}_\alpha,\qquad \alpha=1,2$
• mean $\ \ \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha$
• variance $$\sigma_\alpha^2=\left\langle (\mathbf{w}\cdot\mathbf{x}_\alpha -\mathbf{w}\cdot\mathbf{m}_\alpha)^2 \right\rangle = \left\langle (\mathbf{w}\cdot(\mathbf{x}_\alpha -\mathbf{m}_\alpha))^2 \right\rangle = \left\langle (\mathbf{w}^T(\mathbf{x}_\alpha -\mathbf{m}_\alpha))^2 \right\rangle$$ or $$\sigma_\alpha^2=\left\langle \mathbf{w}^T (\mathbf{x}_\alpha -\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha -\mathbf{m}_\alpha)^T \mathbf{w} \right\rangle = \mathbf{w}^T \left\langle (\mathbf{x}_\alpha -\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha -\mathbf{m}_\alpha)^T \right\rangle \mathbf{w}$$

in terms of the covariance matrix

$$\fbox{\phantom{\big|}\displaystyle\sigma_\alpha^2= \mathbf{w}^T \hat S_\alpha \mathbf{w}\phantom{\big|}}\,, \qquad\quad \hat S_\alpha = \big\langle (\mathbf{x}_\alpha-\mathbf{m}_\alpha)\, (\mathbf{x}_\alpha-\mathbf{m}_\alpha)^T\big\rangle$$

linear discriminant analysis (LDA)

• linear discriminant, mean, variance $$\mathbf{w}\cdot\mathbf{x}_\alpha,\qquad\quad \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha,\qquad\quad \sigma_\alpha^2=\mathbf{w}^T \hat S_\alpha \mathbf{w}$$
• many linear classifier exist

Fisher objective function

• maximize $\ \ (\mu_1-\mu_2)^2\ \$
keeping the combined variance $\ \ \sigma_1^2+\sigma_2^2\ \$ constant

• find feature vector $\ \ \mathbf{w}\ \$ maximizing
$$S = \frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2} = \frac{(\mathbf{w}\cdot(\mathbf{m}_1-\mathbf{m}_2))^2} {\mathbf{w}^T (\hat{S}_1+\hat{S}_2) \mathbf{w}} = \frac{\mathbf{w}^T(\mathbf{m}_1-\mathbf{m}_2) (\mathbf{m}_1-\mathbf{m}_2)^T\mathbf{w}} {\mathbf{w}^T (\hat{S}_1+\hat{S}_2) \mathbf{w}}$$
• generic
$$S = \frac{\mathbf{w}^T\hat{A}\mathbf{w}} {\mathbf{w}^T \hat{B}\mathbf{w}}, \qquad\quad \hat{A}=(\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T, \qquad\quad \hat{B}=\hat{S}_1+\hat{S}_2$$
• quantum mechanics:
$\mathbf{w}\ \$ wavefunction
$\hat{A}\ \$ Hamilton operator
$\hat{B}\ \$ identiy matrix
$S\ \$ expectation value of energy

LDA - maximization

$$S = \frac{\mathbf{w}^T\hat{A}\mathbf{w}} {\mathbf{w}^T \hat{B}\mathbf{w}}, \qquad\quad \hat{A}=(\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T, \qquad\quad \hat{B}=\hat{S}_1+\hat{S}_2$$
• variance $$\delta S= \frac{\mathbf{w}^T\hat{A}\,\delta\mathbf{w}}{\mathbf{w}^T \hat{B}\mathbf{w}} - \frac{\mathbf{w}^T\hat{A}\mathbf{w}}{(\mathbf{w}^T \hat{B}\mathbf{w})^2} \mathbf{w}^T \hat{B}\,\delta\mathbf{w}$$ plus terms in $\ \ (\delta\mathbf{w})^T$
• stationarity $$\delta S=0,\qquad\quad \big(\mathbf{w}^T \hat{B}\mathbf{w}\big)\big(\mathbf{w}^T\hat{A}\big)= \big(\mathbf{w}^T \hat{A}\mathbf{w}\big)\big(\mathbf{w}^T\hat{B}\big)$$

proportionality factors

$$\mathbf{w}^T\hat{A}=\mathbf{w}^T \big((\mathbf{m}_1-\mathbf{m}_2)(\mathbf{m}_1-\mathbf{m}_2)^T\big) \propto (\mathbf{m}_1-\mathbf{m}_2)^T$$
• scalar quantities $$\mathbf{w}^T(\mathbf{m}_1-\mathbf{m}_2),\qquad\quad \mathbf{w}^T \hat{A}\mathbf{w},\qquad\quad \mathbf{w}^T \hat{B}\mathbf{w}$$
• proportionality $$(\mathbf{m}_1-\mathbf{m}_2)^T\propto \mathbf{w}^T\hat{B}, \qquad\quad \mathbf{m}_1-\mathbf{m}_2\propto \hat{B}^T \mathbf{w}= \hat{B} \mathbf{w}, \qquad\quad \hat{B}^T = \hat{B}$$
• the solution for the feature vector $$\fbox{\phantom{\big|}\displaystyle \mathbf{w}\propto \big(\hat{S}_1+\hat{S}_2\big)^{-1} \big(\mathbf{m}_1-\mathbf{m}_2\big) \phantom{\big|}}\,, \qquad \hat{B}=\hat{S}_1+\hat{S}_2$$ may be normalized to unity

regularization

• the system of linear equation $$\big(\hat{S}_1+\hat{S}_2\big)\mathbf{w}= \mathbf{m}_1-\mathbf{m}_2, \qquad\quad \hat{S}_1+\hat{S}_2\ \to\ (1-\lambda)\big(\hat{S}_1+\hat{S}_2\big) +\lambda \hat{I}$$ may be regularized when for a singular combined covariance matrix $\ \ \hat{S}_1+\hat{S}_2$
• shrinking intensity (regularization parameter): $\ \ \lambda$
• identity matrix: $\ \ \hat{I}$

LDA threshold

• already determined: $\ \ \mathbf{w}$ $$z_\alpha=\mathbf{w}\cdot\mathbf{x}_\alpha,\qquad\quad \mu_\alpha=\mathbf{w}\cdot\mathbf{m}_\alpha,\qquad\quad \sigma_\alpha^2=\mathbf{w}^T \hat S_\alpha \mathbf{w},\qquad\quad |\mathbf{w}|=1$$
• determine now threshold $\ \ b:\quad\quad \mathbf{w}\cdot\mathbf{x}=b$

miss-classifications

• case $\ \ \mu_1<\mu_2$
$$E= \int_{-\infty}^b p_2(z)\,dz + \int_b^{\infty} p_1(z)\,dz$$
• error mimiziation
$$\frac{\partial E}{\partial b} = 0, \qquad\quad \fbox{\displaystyle\phantom{\big|} p_1(b) = p_2(b)\phantom{\big|}}$$
• principle of equal probabilities
• threshold close to $\ \ (\mu_1+\mu_2)/2$
otherwise: Gaussian approximation

LDA standalone code

• solve $$\big(\hat{S}_1+\hat{S}_2\big)\mathbf{w}= \mathbf{m}_1-\mathbf{m}_2$$ using Gauss elimination code
• visualize data and separating plane via piping to gnuplot
#include <iostream>
#include <sstream>     // for stringstream
#include <stdio.h>     // for printf
#include <string>      // c_str()
#include <cmath>       // for abs
#include <fstream>     // file streams
#include <stdlib.h>    // srand, rand
using namespace std;

/* contains
*
* double nextRandom()
* string xyValuesGnu()
* void plotViaGnuplot()
* void gaussElimination()
* void generateClassificationData()
* void calcCoVarMatrices()
* void evaluatePlane()
*/

// ***
// *** random number in [0,1]
// ***
double nextRandom()
{ static bool firstTime = true;
if (firstTime)
{
srand( (unsigned)time( NULL ) );
firstTime = false;
}
return (rand()/(double)RAND_MAX);
} // end of nextRandom()

// ***
// *** (x,y) string for inline data input for gnuplot
// ***

template<int N>
string xyValuesGnu(double (&xyValues)[2][N], string dataName)
{
stringstream ss;
ss << dataName << " << " << "EOD\n";       // defining data block
for (int i=0; i<N; i++)
ss << xyValues[0][i] << " " << xyValues[1][i] << endl;
ss << "EOD\n";                             // terminating data block
//
return ss.str();
} // end of xyValuesGnu()

// ***
// *** plot data via gnuplot
// ***

void plotViaGnuplot(string dataNameOne,  string dataToPlotOne,
string dataNameTwo,  string dataToPlotTwo,
string dataNamePlane,string dataToPlotPlane,
string dataNameBasis,string dataToPlotBasis)
{
FILE *pipeGnu = popen("gnuplot", "w");            // streaming to gnuplot
fprintf(pipeGnu, "set term qt persist size 1000,800\n");
fprintf(pipeGnu, "set nokey\n");                  // no legend
//
fprintf(pipeGnu, "%s\n",dataToPlotOne.c_str());
fprintf(pipeGnu, "%s\n",dataToPlotTwo.c_str());
fprintf(pipeGnu, "%s\n",dataToPlotPlane.c_str()); // send data block
fprintf(pipeGnu, "%s\n",dataToPlotBasis.c_str()); // send data block
//                                                 // plot data
fprintf(pipeGnu,   "plot \"%s\" w points pt 5 ps 2 \n", dataNameOne.c_str());
fprintf(pipeGnu, "replot \"%s\" w points pt 5 ps 2 \n", dataNameTwo.c_str());
fprintf(pipeGnu, "replot \"%s\" w lines lw 7 \n", dataNamePlane.c_str());
fprintf(pipeGnu, "replot \"%s\" w linesp pt 7 ps 4 lw 4 \n", dataNameBasis.c_str());
//
fclose(pipeGnu);    // closing the pipe to gnuplot
} // end of plotViaGnuplot

// *** Gauss elimination of Ax=b with pivoting.
// *** On input quadratic matrix A and vector b.
// *** On ouput solution x.

template<int N>
void gaussElimination(double (&A)[N][N], double (&b)[N], double (&x)[N])
{
double temp;                       // temp variable
const double smallNumber = 1e-10;  // a small number
//
//--- loop over all columns A[][col];  A[][col]
for (int col=0; col<N; col++)
{
//--- find pivot row; swap pivot with actualy row
int pivotRow = col;
for (int row=col+1;row<N;row++)
if ( abs(A[row][col]) > abs(A[pivotRow][col]) )
pivotRow = row;

//--- swap row A[col][] with A[pivotRow][]
for (int i=0;i<N;i++)
{
temp = A[col][i];
A[col][i] = A[pivotRow][i];
A[pivotRow][i] = temp;
}

//--- swap elements b[col] with b[pivotRow]
temp = b[col];
b[col] = b[pivotRow];
b[pivotRow] = temp;

//--- throw a warning if matrix is singular or nearly singular
if (abs(A[col][col]) <= smallNumber)
cerr << "Matrix is singular or nearly singular" << endl;

//--- Gauss with pivot within A and b
for (int row=col+1;row<N;row++)
{
double alpha = A[row][col] / A[col][col];  // divide by pivot
b[row] -= alpha*b[col];
for (int j=col;j<N;j++)
A[row][j] -= alpha * A[col][j];
}
}  // end of loop over all columns

//--- back substitution
for (int i=N-1;i>=0;i--)
{
double sum = 0.0;
for (int j=i+1;j<N;j++)
sum += A[i][j] * x[j];
x[i] = (b[i] - sum) / A[i][i];   // pivots on diagonal after swapping
}
}  // end of gaussElimination()

// ***
// *** generate data to be classified; only for (dim==2)
// ***
template<int nOne, int nTwo>
void generateClassificationData(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo])
{
double angleOne =  0.75*M_PI;          // angles of the large axis of the data with
double angleTwo =  0.75*M_PI;          // respect to the x-axis
double L1One[2], L1Two[2];             // large eigen-directions of data distribution
double L2One[2], L2Two[2];             // minor eigen-directions of data distribution
//
double L1OneLength = 1.5;              // respective lengths
double L2OneLength = 0.2;              // respective lengths
double L1TwoLength = 1.0;
double L2TwoLength = 0.2;
//
double rr, ss;
//
L1One[0] =  L1OneLength*cos(angleOne);
L1One[1] =  L1OneLength*sin(angleOne);
L2One[0] =  L2OneLength*sin(angleOne);   // L2 orthognal to L1
L2One[1] = -L2OneLength*cos(angleOne);
//
L1Two[0] =  L1TwoLength*cos(angleTwo);
L1Two[1] =  L1TwoLength*sin(angleTwo);
L2Two[0] =  L2TwoLength*sin(angleTwo);
L2Two[1] = -L2TwoLength*cos(angleTwo);
//
for (int iOne=0; iOne<nOne; iOne++)     // generate data for first class
{
rr = 2.0*nextRandom()-1.0;
ss = 2.0*nextRandom()-1.0;
dataClassOne[0][iOne] =  1.0 + rr*L1One[0] + ss*L2One[0];  // x component
dataClassOne[1][iOne] =  0.0 + rr*L1One[1] + ss*L2One[1];  // y component
}
//
for (int iTwo=0; iTwo<nTwo; iTwo++)     // generate data for second class
{
rr = 2.0*nextRandom()-1.0;
ss = 2.0*nextRandom()-1.0;
dataClassTwo[0][iTwo] = -1.0 + rr*L1Two[0] + ss*L2Two[0];
dataClassTwo[1][iTwo] =  0.0 + rr*L1Two[1] + ss*L2Two[1];
}
// test printing
if (1==2)
{
for (int iOne=0; iOne<nOne; iOne++)
cout <<  dataClassOne[0][iOne] << " " << dataClassOne[1][iOne] << endl;
cout << endl;
for (int iTwo=0; iTwo<nTwo; iTwo++)
cout <<  dataClassTwo[0][iTwo] << " " << dataClassTwo[1][iTwo] << endl;
}
}    // end of generateClassificationData()

// ***
// *** calculate the covariance matrices
// ***
template<int nOne, int nTwo>
void calcCoVarMatrices(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo],
double coVarOne[2][2], double coVarTwo[2][2],
double mOne[2], double mTwo[2],
double S[2][2], double deltaM[2])
{
mOne[0] = 0;
mOne[1] = 0;
mTwo[0] = 0;
mTwo[1] = 0;
//
for (int iOne=0; iOne<nOne; iOne++)
{
mOne[0] += dataClassOne[0][iOne]/nOne;
mOne[1] += dataClassOne[1][iOne]/nOne;
}
for (int iTwo=0; iTwo<nTwo; iTwo++)
{
mTwo[0] += dataClassTwo[0][iTwo]/nTwo;
mTwo[1] += dataClassTwo[1][iTwo]/nTwo;
}
//
deltaM[0] = mTwo[0] - mOne[0];
deltaM[1] = mTwo[1] - mOne[1];
//
for (int iOne=0; iOne<nOne; iOne++)
for (int i=0; i<2; i++)
for (int j=0; j<2; j++)
coVarOne[i][j] += (dataClassOne[i][iOne]-mOne[i])*
(dataClassOne[j][iOne]-mOne[j])/nOne;
for (int iTwo=0; iTwo<nTwo; iTwo++)
for (int i=0; i<2; i++)
for (int j=0; j<2; j++)
coVarTwo[i][j] += (dataClassTwo[i][iTwo]-mTwo[i])*
(dataClassTwo[j][iTwo]-mTwo[j])/nTwo;
//
for (int i=0; i<2; i++)
for (int j=0; j<2; j++)
S[i][j] = coVarOne[i][j] + coVarTwo[i][j];
//
} // end of calcCoVarMatrices()

// ***
// *** evaluate the plane
// ***
void evaluatePlane(double w[2], double dataPlane[2][2])
{
double rr = sqrt(w[0]*w[0]+w[1]*w[1]);
w[0] = w[0]/rr;
w[1] = w[1]/rr;        // normalization
//
dataPlane[0][0] =  1.8*w[1];      // orthogonal
dataPlane[1][0] = -1.8*w[0];
dataPlane[0][1] = -dataPlane[0][0];
dataPlane[1][1] = -dataPlane[1][0];
//
// test printing
if (1==1)
cout <<  w[0] << " " << w[1] << endl;
} // end of evaluatePlane()

// ***
// *** main
// ***
int main()
{
double dataClassOne[2][900];        // the data points
double dataClassTwo[2][800];
double dataBasis[2][2];             // connecting the center of masses
//
dataBasis[0][0] = -1.0;             // (-1,0)
dataBasis[1][0] =  0.0;
dataBasis[0][1] =  1.0;             // (1,0)
dataBasis[1][1] =  0.0;
//
double mOne[2];                     // center of masses
double mTwo[2];
double deltaM[2];                   // mTwo-mOne
double w[2];                        // feature vector
double dataPlane[2][2];             // orthogonal to the feature vector
//
double coVarOne[2][2];              // covariance matrices
double coVarTwo[2][2];
double        S[2][2];              // sum of covariance matrices
//
generateClassificationData(dataClassOne,dataClassTwo);
//
calcCoVarMatrices(dataClassOne, dataClassTwo, coVarOne, coVarTwo,
mOne, mTwo, S, deltaM);
//
gaussElimination(S,deltaM,w);             // solve    "S deltaM = w"
evaluatePlane(w, dataPlane);              // w defines the plane
//
string dataNameOne   = "$dataOne"; // dataName string dataNameTwo = "$dataTwo";
string dataNamePlane = "$dataPlane"; string dataNameBasis = "$dataBasis";
string dataToPlotOne   = xyValuesGnu(dataClassOne,dataNameOne);  // data to string
string dataToPlotTwo   = xyValuesGnu(dataClassTwo,dataNameTwo);
string dataToPlotPlane = xyValuesGnu(dataPlane,dataNamePlane);
string dataToPlotBasis = xyValuesGnu(dataBasis,dataNameBasis);
//
plotViaGnuplot(dataNameOne  ,dataToPlotOne,
dataNameTwo  ,dataToPlotTwo,
dataNamePlane,dataToPlotPlane,
dataNameBasis,dataToPlotBasis);
return 1;
}  // end of main()