Advanced Introduction to C++, Scientific Computing and Machine Learning

Claudius Gros, SS 2024

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

Support Vector Machines - Basics

maximal margins

find seperating hyperplane with maximal margins
: distance to the features
most likely to classify test data correctly
combined training set

$\displaystyle\qquad\quad \big\{\mathbf{x}_i,l_i\big\}, \qquad l_i=\pm1 $

class identifier

$\displaystyle\qquad\quad \mathbf{L} = \big(l_0,\,\dots,\,l_{N-1}\big) $

hyperplane

$\displaystyle\qquad\quad \mathbf{w}\cdot\mathbf{x}=b $

margin condition

$$ \begin{array}{rclcrcl} \mathbf{w}\cdot\mathbf{x}_i&\ge& b+\tilde d &\quad\mbox{for}\quad & y_i&=&+1 \\ \mathbf{w}\cdot\mathbf{x}_i&\le& b-\tilde d &\quad\mbox{for}\quad & y_i&=&-1 \\ \end{array} $$

shift $\ b\ $ left/right
plane geometry dependent only on $ \ \ \mathbf{w}/b$
: may select $\ \ \tilde d \to 1$

$$ \pm \tilde{d}\ \to\ l_i, \qquad\quad \fbox{$\displaystyle \phantom{\big|}l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big)- 1 \ge 0\phantom{\big|}$}\,, \qquad\quad \forall i $$

maximal margin condition

distance to orgin

$$ \mathbf{x}= x\frac{\mathbf{w}}{|\mathbf{w}|}, \qquad\quad \mathbf{w}\cdot\mathbf{x}=b, \qquad\quad x=\frac{b}{|\mathbf{w}|} $$

support vector: on margin
euklidian margin $\ \ d=\frac{1}{|\mathbf{w}|}$
maximal margin $\ \leftrightarrow\ $ minimal length of $\ \mathbf{w}$

optimization conditions

$$ \mathrm{min}\,|\mathbf{w}|,\qquad\quad l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big)- 1 \ge 0, \qquad\quad \forall i $$

$\mathrm{min}\,|\mathbf{w}|\ \ $ equivalent to $\ \ \mathrm{min}\,\mathbf{w}^2/2\ $
enforce constraints using Lagrange multipliers

support vectors

objective function to optimize

$$ L_P = \frac{\mathbf{w}^2}{2} -\sum_{i=0}^{N-1} a_i\,\Big[ l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big)- 1\big] $$

$N\ \ $ Lagrange parameters $\ \ \mathbf{a}=\big(a_0,\,\dots,\,a_{N-1}\big)$

minimisation vs. maximisation

minimization of $\ L_P\ $ with respect to $\ \mathbf{w},\, b$
maximization of $\ L_P\ $ with respect to $\ \mathbf{a}$

support vectors

$$ l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big)- 1 \ge 0, \qquad\quad l_s\big(\mathbf{w}\cdot\mathbf{x}_s-b\big)- 1 = 0 $$

equality for support vector $\ \ \mathbf{x}_s,\, l_s$
maximal for support vector

$$ -a_i\,\Big[ l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big)- 1\big], \qquad\quad a_i\ge0 $$

support vector machine

$$ \fbox{$\displaystyle\phantom{\big|} a_i\to0 \phantom{\big|}$} \qquad\mathrm{for\ non-support\ vectors} $$

non-support vectors irrelevant

dual representation

$$ L_P = \frac{\mathbf{w}^2}{2} -\sum_i a_i l_i\big(\mathbf{w}\cdot\mathbf{x}_i-b\big) +\sum_i a_i $$

stationarity: weight vector

$$ \frac{\partial L_P}{\partial\mathbf{w}} = 0 \qquad\quad \Rightarrow \qquad\quad \fbox{$\displaystyle\phantom{\big|} \mathbf{w} = \sum_i a_i l_i\mathbf{x}_i \phantom{\big|}$} $$

stationarity: hyperplane offset

$$ \frac{\partial L_P}{\partial b} = 0 \qquad\quad \Rightarrow \qquad\quad \fbox{$\displaystyle\phantom{\big|} \sum_i a_i l_i=0 \phantom{\big|}$} $$

substitution 'transformation primal to dual': $\ \ L_P \to L_D$

$$ L_D = \frac{\mathbf{w}^2}{2}-\mathbf{w}^2+\sum_i a_i = \sum_i a_i -\frac{1}{2}\sum_{i,j} \big( a_i l_i \mathbf{x}_i\big)\cdot \big( a_j l_j \mathbf{x}_j\big) $$

dual form

$$ \underset{\mathbf{a}}{\mathrm{max}}\left[ \sum_i a_i-\frac{1}{2}\sum_{ij} a_iH_{ij}a_j\right]\,, \qquad a_i\ge 0, \qquad \sum_i a_il_i=0 $$

variables: Lagrange parameters $\ \ a_i$
quadratic form

$$ \fbox{$\displaystyle\phantom{\big|} H_{ij} = l_i\,\mathbf{x}_i\cdot\mathbf{x}_j\,l_j \phantom{\big|}$} $$

quadratic part is always negative

$$ -\frac{1}{2}\sum_{ij} a_iH_{ij}a_j = -\frac{1}{2}\mathbf{c}\cdot\mathbf{c}, \qquad\quad \mathbf{c} = \sum_j a_j l_j \mathbf{x}_j $$

hyperplane

$$ \underset{\{a_l\}}{\mathrm{max}}\left[ \sum_i a_i-\frac{1}{2}\sum_{ij} a_iH_{ij}a_j\right]\,, \qquad a_i\ge 0, \qquad \sum_i a_il_i=0 $$

find $\ \{a_l\}$

$$ \mathbf{w}=\sum_ia_il_i\mathbf{x}_i =\sum_{m\in S} a_ml_m\mathbf{x}_m $$

set of support vectors $\ \ S$
let $\ \mathbf{x}_s,\, l_s$ be a support vector, with $\ \ l_s^2=1$

$$ l_s\big(\mathbf{w}\cdot\mathbf{x}_s-b\big) = 1, \qquad\quad \mathbf{w}\cdot\mathbf{x}_s-b = l_s $$

for a given support vector $\ \ s$

$$ b = \sum_{m\in S} a_ml_m\,\mathbf{x}_m\cdot\mathbf{x}_s -l_s $$

may be averaged of the set of support vectors $\ \ \mathbf{x}_s,\,l_s$

numerics

methods to maximize quadratic functions ..
: 'quadratic program' (QP)
here: steepest ascent

$$ \frac{\partial L_D}{\partial a_k} =1-l_k\mathbf{x}_k\cdot\mathbf{w}, \qquad\quad L_D = \sum_i a_i -\frac{1}{2} \mathbf{w}\cdot\mathbf{w}, \qquad\quad \mathbf{w} = \sum_j a_j l_j \mathbf{x}_j $$

(1) update Lagrange parameters

stepsize: $\ \ \epsilon_a\ll1$

$$ a_k \ \ \to\ \ a_k+\epsilon_a\big( 1-l_k\mathbf{x}_k\cdot\mathbf{w}) $$

(2) orthogonalize Lagrange parameters

fullfill

$$ \frac{\partial L_P}{\partial b} = 0, \qquad\quad \sum_i a_i l_i=0, \qquad\quad \fbox{$\displaystyle\phantom{\big|} \mathbf{a}\cdot\mathbf{L}=0 \phantom{\big|}$}\,, \qquad\quad \mathbf{L}=\big(l_0,\,\dots,\,l_{N-1}\big) $$

$\mathbf{a}\ $ orthogonal to $\ \mathbf{L}$

$$ \fbox{$\displaystyle\phantom{\big|} \mathbf{a}\ \ \to\ \ \mathbf{a} - \mathbf{a}\cdot\mathbf{L}\displaystyle\frac{\mathbf{L}}{\mathbf{L}\cdot\mathbf{L}} \phantom{\big|}$}\,, \qquad\quad \mathbf{L}\cdot\left[\mathbf{a} - \mathbf{a}\cdot\mathbf{L}\frac{\mathbf{L}}{\mathbf{L}\cdot\mathbf{L}} \right]=0, \qquad\quad \mathbf{L}\cdot\mathbf{L} = N $$

(3) special treatment for non-support vectors

fullfill $\ \ a_l\ge 0$
set $\ a_l\,\to\, 0\ $ if negative

steepest ascent SVM code

linear, separable case
based on linear classifier percepton code
here: no control of convergence
note: visualization costs some time, saving however more

#include <iostream>
#include <sstream>     // for stringstream 
#include <stdio.h>     // for printf
#include <string>      // c_str()
#include <cmath>       // for abs
#include <fstream>     // file streams
#include <stdlib.h>    // srand, rand
using namespace std;

/* contains
 *
 * double nextRandom()
 * string xyValuesGnu()
 * void plotViaGnuplot()
 * void gaussElimination()
 * void generateClassificationData()
 * void calcCoVarMatrices()
 * void evaluatePlane()
 * void svmGradientAscent() 
 */


// ***
// *** random number in [0,1] 
// ***
double nextRandom()
{ static bool firstTime = true;
 if (firstTime)
   {
   srand( (unsigned)time( NULL ) );
   firstTime = false;
   }
 return (rand()/(double)RAND_MAX); 
} // end of nextRandom()


// ***
// *** (x,y) string for inline data input for gnuplot
// ***

template<int N>
string xyValuesGnu(double (&xyValues)[2][N], string dataName)
{
 stringstream ss;
 ss << dataName << " << " << "EOD\n";       // defining data block
 for (int i=0; i<N; i++)
   ss << xyValues[0][i] << " " << xyValues[1][i] << endl;
 ss << "EOD\n";                             // terminating data block
//
 return ss.str();
} // end of xyValuesGnu()


// *** 
// *** plot data via gnuplot
// *** 

void plotViaGnuplot(string dataNameOne,     string dataToPlotOne,
                    string dataNameTwo,     string dataToPlotTwo,
                    string dataNamePlane,   string dataToPlotPlane,
                    string dataNamePlaneSVM,string dataToPlotPlaneSVM,
                    string dataNameP_m1_SVM,string dataToPlotP_m1_SVM,
                    string dataNameP_m2_SVM,string dataToPlotP_m2_SVM,
                    string dataNameBasis,   string dataToPlotBasis)
{
 FILE *pipeGnu = popen("gnuplot", "w");            // streaming to gnuplot
 fprintf(pipeGnu, "set term qt persist size 1000,800\n");
 fprintf(pipeGnu, "set nokey\n");                  // no legend
//
 fprintf(pipeGnu, "%s\n",dataToPlotOne.c_str());   
 fprintf(pipeGnu, "%s\n",dataToPlotTwo.c_str());   
 fprintf(pipeGnu, "%s\n",dataToPlotPlane.c_str()); 
 fprintf(pipeGnu, "%s\n",dataToPlotPlaneSVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotP_m1_SVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotP_m2_SVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotBasis.c_str());   
//                                                 // plot data
 fprintf(pipeGnu,   "plot \"%s\" w points pt 5 ps 4 \n", dataNameOne.c_str());
 fprintf(pipeGnu, "replot \"%s\" w points pt 5 ps 4 \n", dataNameTwo.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 7 lt rgb \"grey\" dt 2 \n", dataNamePlane.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 7 lt rgb \"gold\" \n", dataNamePlaneSVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 2 lt rgb \"gold\" \n", dataNameP_m1_SVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 2 lt rgb \"gold\" \n", dataNameP_m2_SVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w linesp pt 7 ps 4 lw 4 \n", dataNameBasis.c_str());
//
 fclose(pipeGnu);    // closing the pipe to gnuplot
} // end of plotViaGnuplot


// *** Gauss elimination of Ax=b with pivoting.
// *** On input quadratic matrix A and vector b.
// *** On ouput solution x.
 
template<int N>
void gaussElimination(double (&A)[N][N], double (&b)[N], double (&x)[N]) 
{
 double temp;                       // temp variable
 const double smallNumber = 1e-10;  // a small number
//
//--- loop over all columns A[][col];  A[][col]
 for (int col=0; col<N; col++) 
   {             
//--- find pivot row; swap pivot with actualy row
   int pivotRow = col; 
   for (int row=col+1;row<N;row++) 
     if ( abs(A[row][col]) > abs(A[pivotRow][col]) ) 
       pivotRow = row;
 
//--- swap row A[col][] with A[pivotRow][]
   for (int i=0;i<N;i++)
     {
     temp = A[col][i];
     A[col][i] = A[pivotRow][i];
     A[pivotRow][i] = temp;
     }
 
//--- swap elements b[col] with b[pivotRow]
   temp = b[col];  
   b[col] = b[pivotRow]; 
   b[pivotRow] = temp;
 
//--- throw a warning if matrix is singular or nearly singular
   if (abs(A[col][col]) <= smallNumber) 
     cerr << "Matrix is singular or nearly singular" << endl;
 
//--- Gauss with pivot within A and b
   for (int row=col+1;row<N;row++) 
     {
     double alpha = A[row][col] / A[col][col];  // divide by pivot
     b[row] -= alpha*b[col];
     for (int j=col;j<N;j++) 
       A[row][j] -= alpha * A[col][j];
     }
      }  // end of loop over all columns
 
//--- back substitution
 for (int i=N-1;i>=0;i--) 
   {
   double sum = 0.0;
   for (int j=i+1;j<N;j++) 
      sum += A[i][j] * x[j];
   x[i] = (b[i] - sum) / A[i][i];   // pivots on diagonal after swapping
   }
}  // end of gaussElimination()
 
 
// ***
// *** generate data to be classified; only for (dim==2)
// ***
template<int nOne, int nTwo>
void generateClassificationData(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo]) 
{
 double angleOne =  0.75*M_PI;          // angles of the large axis of the data with
 double angleTwo =  0.75*M_PI;          // respect to the x-axis
 double L1One[2], L1Two[2];             // large eigen-directions of data distribution
 double L2One[2], L2Two[2];             // minor eigen-directions of data distribution
//
 double L1OneLength = 1.5;              // respective lengths
 double L2OneLength = 0.2;              // respective lengths
 double L1TwoLength = 1.0;         
 double L2TwoLength = 0.2;        
//
 double rr, ss;
//
 L1One[0] =  L1OneLength*cos(angleOne);
 L1One[1] =  L1OneLength*sin(angleOne);
 L2One[0] =  L2OneLength*sin(angleOne);   // L2 orthognal to L1
 L2One[1] = -L2OneLength*cos(angleOne);
//
 L1Two[0] =  L1TwoLength*cos(angleTwo);
 L1Two[1] =  L1TwoLength*sin(angleTwo);
 L2Two[0] =  L2TwoLength*sin(angleTwo);   
 L2Two[1] = -L2TwoLength*cos(angleTwo);
//
 for (int iOne=0; iOne<nOne; iOne++)     // generate data for first class
  {
  rr = 2.0*nextRandom()-1.0;
  ss = 2.0*nextRandom()-1.0;
  dataClassOne[0][iOne] =  1.0 + rr*L1One[0] + ss*L2One[0];  // x component
  dataClassOne[1][iOne] =  0.0 + rr*L1One[1] + ss*L2One[1];  // y component
  }
//
 for (int iTwo=0; iTwo<nTwo; iTwo++)     // generate data for second class
  {
  rr = 2.0*nextRandom()-1.0;
  ss = 2.0*nextRandom()-1.0;
  dataClassTwo[0][iTwo] = -1.0 + rr*L1Two[0] + ss*L2Two[0];
  dataClassTwo[1][iTwo] =  0.0 + rr*L1Two[1] + ss*L2Two[1]; 
  }
// test printing
 if (1==2)
   {
   for (int iOne=0; iOne<nOne; iOne++)   
     cout <<  dataClassOne[0][iOne] << " " << dataClassOne[1][iOne] << endl;
   cout << endl;
   for (int iTwo=0; iTwo<nTwo; iTwo++)   
     cout <<  dataClassTwo[0][iTwo] << " " << dataClassTwo[1][iTwo] << endl;
   }
}    // end of generateClassificationData()
 

// ***
// *** calculate the covariance matrices
// ***
template<int nOne, int nTwo>
void calcCoVarMatrices(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo],
                       double coVarOne[2][2], double coVarTwo[2][2],
                       double mOne[2], double mTwo[2], 
                       double S[2][2], double deltaM[2])
{
 mOne[0] = 0;
 mOne[1] = 0;
 mTwo[0] = 0;
 mTwo[1] = 0;
//
 for (int iOne=0; iOne<nOne; iOne++)   
   {
   mOne[0] += dataClassOne[0][iOne]/nOne;
   mOne[1] += dataClassOne[1][iOne]/nOne;
   }
 for (int iTwo=0; iTwo<nTwo; iTwo++)   
   {
   mTwo[0] += dataClassTwo[0][iTwo]/nTwo;
   mTwo[1] += dataClassTwo[1][iTwo]/nTwo;
   }
//
   deltaM[0] = mTwo[0] - mOne[0];
   deltaM[1] = mTwo[1] - mOne[1];
//
 for (int iOne=0; iOne<nOne; iOne++)   
   for (int i=0; i<2; i++) 
     for (int j=0; j<2; j++) 
       coVarOne[i][j] += (dataClassOne[i][iOne]-mOne[i])*
                         (dataClassOne[j][iOne]-mOne[j])/nOne;
 for (int iTwo=0; iTwo<nTwo; iTwo++)   
   for (int i=0; i<2; i++) 
     for (int j=0; j<2; j++) 
       coVarTwo[i][j] += (dataClassTwo[i][iTwo]-mTwo[i])*
                         (dataClassTwo[j][iTwo]-mTwo[j])/nTwo;
//
 for (int i=0; i<2; i++) 
   for (int j=0; j<2; j++) 
     S[i][j] = coVarOne[i][j] + coVarTwo[i][j];
//
} // end of calcCoVarMatrices()


// ***
// *** evaluate the plane
// ***
void  evaluatePlane(double w[2], double dataPlane[2][2])
{
 double rr = sqrt(w[0]*w[0]+w[1]*w[1]);
 w[0] = w[0]/rr;
 w[1] = w[1]/rr;                   // normalization
//
 dataPlane[0][0] =  1.8*w[1];      // orthogonal
 dataPlane[1][0] = -1.8*w[0];
 dataPlane[0][1] = -dataPlane[0][0]; 
 dataPlane[1][1] = -dataPlane[1][0];
//
// test printing
 if (1==2)
   cout <<  "# w[0], w[1] : " << w[0] << " " << w[1] << endl;
} // end of evaluatePlane()


// ***
// *** support vector machine optimization
// ***
template<int N>
void svmGradientAscent(double svmData[2][N], double svmL[N], double svmA[N], 
                       double &svmB, double svmW[2], double svmDataPlane[2][2],
                       double svm_m1_Plane[2][2], double svm_m2_Plane[2][2])
{
 int nIter      =  400000;    // fixed number of update iterations
 double epsilon =    0.01;    // update rate
 double rr; 
//
 for (int iIter=0; iIter<nIter; iIter++)
   {
   svmW[0] = 0.0;
   svmW[1] = 0.0;
   for (int i=0; i<N; i++)
     for (int k=0; k<2; k++)
        svmW[k] += svmA[i]*svmL[i]*svmData[k][i];
//                                                           // (1) gradient
   for (int i=0; i<N; i++)
     {
     rr = svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i];
     svmA[i] += epsilon*(1.0-svmL[i]*rr);
     }
//                                                           // (2) orthogonalization
   rr = 0.0;
   for (int i=0; i<N; i++)
     rr += svmA[i]*svmL[i];
   for (int i=0; i<N; i++)
     svmA[i] = svmA[i] - rr*svmL[i]/N;
//                                                           // (3) positiveness
   for (int i=0; i<N; i++)
     if (svmA[i]<0.0)
       svmA[i] = 0.0;
   } // end of loop over iterations
//
//                                                           // offset
//
 svmB = 0.0;
 for (int i=0; i<N; i++)
   if (svmA[i]>0.001)
     {
     svmB = svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i] - svmL[i];
//
     cout << "# b  " <<  svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i] - svmL[i] << endl;
     }
//                                                           // data plane
 rr = sqrt(svmW[0]*svmW[0]+svmW[1]*svmW[1]);
//                                         // w*b/|w|^2   on plane   w*x=b
 svmDataPlane[0][0] = svmB*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svmDataPlane[1][0] = svmB*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svmDataPlane[0][1] = svmB*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr; 
 svmDataPlane[1][1] = svmB*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//                                                           // margin planes
 svm_m1_Plane[0][0] = (svmB-1)*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svm_m1_Plane[1][0] = (svmB-1)*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svm_m1_Plane[0][1] = (svmB-1)*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr; 
 svm_m1_Plane[1][1] = (svmB-1)*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//
 svm_m2_Plane[0][0] = (svmB+1)*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svm_m2_Plane[1][0] = (svmB+1)*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svm_m2_Plane[0][1] = (svmB+1)*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr;
 svm_m2_Plane[1][1] = (svmB+1)*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//
//                                                           // test printout
//
 if (1==1)
   {
   printf("#%4s %5s %10s %12s\n","", "class", "a_i", "w*x_i-b");
   for (int i=0; i<N; i++)
     printf("#%4d %5d %10.4f %12.4f\n",i, int(svmL[i]), svmA[i], 
             svmData[0][i]*svmW[0]+svmData[1][i]*svmW[1] - svmB );   // support condition
   }
//
} // end of svmGradientAscent()


// ***
// *** main
// ***
int main() 
{
 const int N1 =   4;                 // number of training data per class
 const int N2 =   2;
 double dataClassOne[2][N1];         // class data points 
 double dataClassTwo[2][N2];
 double dataBasis[2][2];             // connecting the center of masses
//
 dataBasis[0][0] = -1.0;             // (-1,0)
 dataBasis[1][0] =  0.0;    
 dataBasis[0][1] =  1.0;             // (1,0)
 dataBasis[1][1] =  0.0;    
//
 double mOne[2];                     // center of masses
 double mTwo[2];                
 double deltaM[2];                   // mTwo-mOne
 double w[2];                        // feature vector
 double dataPlane[2][2];             // orthogonal to the feature vector
//
 double coVarOne[2][2];              // covariance matrices
 double coVarTwo[2][2];
 double        S[2][2];              // sum of covariance matrices
//
 generateClassificationData(dataClassOne,dataClassTwo); 
//
 calcCoVarMatrices(dataClassOne, dataClassTwo, coVarOne, coVarTwo,
                   mOne, mTwo, S, deltaM);               
//
 gaussElimination(S,deltaM,w);             // solve    "S deltaM = w"
 evaluatePlane(w, dataPlane);              // w defines the plane
// --- ------------------
// --- start code for SVM (support vector machine)
// --- ------------------
 const int N = N1+N2;       // combined 
 double svmData[2][N];      // combined training data
 double svmL[N];            // class labels
 double svmA[N];            // Lagrange parameters
 double svmB;               // hyperplane offset
 double svmW[2];            // weight vector
 double svmDataPlane[2][2]; // orthogonal to weight vector
 double svm_m1_Plane[2][2]; // margin one hyperplane
 double svm_m2_Plane[2][2]; // margin two hyperplane
//
 for (int i=0;i<N1;i++)     
   {
   svmData[0][i] = dataClassOne[0][i];  // merge data
   svmData[1][i] = dataClassOne[1][i]; 
   svmL[i]       = 1.0;
   svmA[i]       = nextRandom();        // (positive) random initialization
   }
 for (int j=0;j<N2;j++)     
   {
   svmData[0][N1+j] = dataClassTwo[0][j]; 
   svmData[1][N1+j] = dataClassTwo[1][j]; 
   svmL[N1+j]       = -1.0;
   svmA[N1+j]       = nextRandom();    
   }
 svmGradientAscent(svmData, svmL, svmA, svmB, svmW, svmDataPlane, svm_m1_Plane, svm_m2_Plane);
// --- ------------------
// ---  end  code for SVM 
// --- ------------------
 string dataNameOne      = "$dataOne";     
 string dataNameTwo      = "$dataTwo";                             
 string dataNamePlane    = "$dataPlane";                             
 string dataNamePlaneSVM = "$dataPlaneSVM";                             
 string dataNameP_m1_SVM = "$dataP_m1_SVM";                             
 string dataNameP_m2_SVM = "$dataP_m2_SVM";                             
 string dataNameBasis    = "$dataBasis";                             
 string dataToPlotOne      = xyValuesGnu(dataClassOne,dataNameOne);  // data to string
 string dataToPlotTwo      = xyValuesGnu(dataClassTwo,dataNameTwo);  
 string dataToPlotPlane    = xyValuesGnu(dataPlane,dataNamePlane);  
 string dataToPlotPlaneSVM = xyValuesGnu(svmDataPlane,dataNamePlaneSVM);  
 string dataToPlotP_m1_SVM = xyValuesGnu(svm_m1_Plane,dataNameP_m1_SVM);  
 string dataToPlotP_m2_SVM = xyValuesGnu(svm_m2_Plane,dataNameP_m2_SVM);  
 string dataToPlotBasis    = xyValuesGnu(dataBasis,dataNameBasis);  
//
//
 plotViaGnuplot(dataNameOne,     dataToPlotOne,
                dataNameTwo,     dataToPlotTwo, 
                dataNamePlane,   dataToPlotPlane,
                dataNamePlaneSVM,dataToPlotPlaneSVM,
                dataNameP_m1_SVM,dataToPlotP_m1_SVM,
                dataNameP_m2_SVM,dataToPlotP_m2_SVM,
                dataNameBasis,   dataToPlotBasis);
//
 return 1;
}  // end of main()

#include <iostream>
#include <sstream>     // for stringstream 
#include <stdio.h>     // for printf
#include <string>      // c_str()
#include <cmath>       // for abs
#include <fstream>     // file streams
#include <stdlib.h>    // srand, rand
using namespace std;

/* contains
 *
 * double nextRandom()
 * string xyValuesGnu()
 * void plotViaGnuplot()
 * void gaussElimination()
 * void generateClassificationData()
 * void calcCoVarMatrices()
 * void evaluatePlane()
 * void svmGradientAscent() 
 */

// ***
// *** random number in [0,1] 
// ***
double nextRandom()
{ static bool firstTime = true;
 if (firstTime)
   {
   srand( (unsigned)time( NULL ) );
   firstTime = false;
   }
 return (rand()/(double)RAND_MAX); 
} // end of nextRandom()

// ***
// *** (x,y) string for inline data input for gnuplot
// ***

template<int N>
string xyValuesGnu(double (&xyValues)[2][N], string dataName)
{
 stringstream ss;
 ss << dataName << " << " << "EOD\n";       // defining data block
 for (int i=0; i<N; i++)
   ss << xyValues[0][i] << " " << xyValues[1][i] << endl;
 ss << "EOD\n";                             // terminating data block
//
 return ss.str();
} // end of xyValuesGnu()

// *** 
// *** plot data via gnuplot
// ***

void plotViaGnuplot(string dataNameOne,     string dataToPlotOne,
                    string dataNameTwo,     string dataToPlotTwo,
                    string dataNamePlane,   string dataToPlotPlane,
                    string dataNamePlaneSVM,string dataToPlotPlaneSVM,
                    string dataNameP_m1_SVM,string dataToPlotP_m1_SVM,
                    string dataNameP_m2_SVM,string dataToPlotP_m2_SVM,
                    string dataNameBasis,   string dataToPlotBasis)
{
 FILE *pipeGnu = popen("gnuplot", "w");            // streaming to gnuplot
 fprintf(pipeGnu, "set term qt persist size 1000,800\n");
 fprintf(pipeGnu, "set nokey\n");                  // no legend
//
 fprintf(pipeGnu, "%s\n",dataToPlotOne.c_str());   
 fprintf(pipeGnu, "%s\n",dataToPlotTwo.c_str());   
 fprintf(pipeGnu, "%s\n",dataToPlotPlane.c_str()); 
 fprintf(pipeGnu, "%s\n",dataToPlotPlaneSVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotP_m1_SVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotP_m2_SVM.c_str());
 fprintf(pipeGnu, "%s\n",dataToPlotBasis.c_str());   
//                                                 // plot data
 fprintf(pipeGnu,   "plot \"%s\" w points pt 5 ps 4 \n", dataNameOne.c_str());
 fprintf(pipeGnu, "replot \"%s\" w points pt 5 ps 4 \n", dataNameTwo.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 7 lt rgb \"grey\" dt 2 \n", dataNamePlane.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 7 lt rgb \"gold\" \n", dataNamePlaneSVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 2 lt rgb \"gold\" \n", dataNameP_m1_SVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w lines lw 2 lt rgb \"gold\" \n", dataNameP_m2_SVM.c_str());
 fprintf(pipeGnu, "replot \"%s\" w linesp pt 7 ps 4 lw 4 \n", dataNameBasis.c_str());
//
 fclose(pipeGnu);    // closing the pipe to gnuplot
} // end of plotViaGnuplot

// *** Gauss elimination of Ax=b with pivoting.
// *** On input quadratic matrix A and vector b.
// *** On ouput solution x.
 
template<int N>
void gaussElimination(double (&A)[N][N], double (&b)[N], double (&x)[N]) 
{
 double temp;                       // temp variable
 const double smallNumber = 1e-10;  // a small number
//
//--- loop over all columns A[][col];  A[][col]
 for (int col=0; col<N; col++) 
   {             
//--- find pivot row; swap pivot with actualy row
   int pivotRow = col; 
   for (int row=col+1;row<N;row++) 
     if ( abs(A[row][col]) > abs(A[pivotRow][col]) ) 
       pivotRow = row;
 
//--- swap row A[col][] with A[pivotRow][]
   for (int i=0;i<N;i++)
     {
     temp = A[col][i];
     A[col][i] = A[pivotRow][i];
     A[pivotRow][i] = temp;
     }
 
//--- swap elements b[col] with b[pivotRow]
   temp = b[col];  
   b[col] = b[pivotRow]; 
   b[pivotRow] = temp;
 
//--- throw a warning if matrix is singular or nearly singular
   if (abs(A[col][col]) <= smallNumber) 
     cerr << "Matrix is singular or nearly singular" << endl;
 
//--- Gauss with pivot within A and b
   for (int row=col+1;row<N;row++) 
     {
     double alpha = A[row][col] / A[col][col];  // divide by pivot
     b[row] -= alpha*b[col];
     for (int j=col;j<N;j++) 
       A[row][j] -= alpha * A[col][j];
     }
      }  // end of loop over all columns
 
//--- back substitution
 for (int i=N-1;i>=0;i--) 
   {
   double sum = 0.0;
   for (int j=i+1;j<N;j++) 
      sum += A[i][j] * x[j];
   x[i] = (b[i] - sum) / A[i][i];   // pivots on diagonal after swapping
   }
}  // end of gaussElimination()
 
 
// ***
// *** generate data to be classified; only for (dim==2)
// ***
template<int nOne, int nTwo>
void generateClassificationData(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo]) 
{
 double angleOne =  0.75*M_PI;          // angles of the large axis of the data with
 double angleTwo =  0.75*M_PI;          // respect to the x-axis
 double L1One[2], L1Two[2];             // large eigen-directions of data distribution
 double L2One[2], L2Two[2];             // minor eigen-directions of data distribution
//
 double L1OneLength = 1.5;              // respective lengths
 double L2OneLength = 0.2;              // respective lengths
 double L1TwoLength = 1.0;         
 double L2TwoLength = 0.2;        
//
 double rr, ss;
//
 L1One[0] =  L1OneLength*cos(angleOne);
 L1One[1] =  L1OneLength*sin(angleOne);
 L2One[0] =  L2OneLength*sin(angleOne);   // L2 orthognal to L1
 L2One[1] = -L2OneLength*cos(angleOne);
//
 L1Two[0] =  L1TwoLength*cos(angleTwo);
 L1Two[1] =  L1TwoLength*sin(angleTwo);
 L2Two[0] =  L2TwoLength*sin(angleTwo);   
 L2Two[1] = -L2TwoLength*cos(angleTwo);
//
 for (int iOne=0; iOne<nOne; iOne++)     // generate data for first class
  {
  rr = 2.0*nextRandom()-1.0;
  ss = 2.0*nextRandom()-1.0;
  dataClassOne[0][iOne] =  1.0 + rr*L1One[0] + ss*L2One[0];  // x component
  dataClassOne[1][iOne] =  0.0 + rr*L1One[1] + ss*L2One[1];  // y component
  }
//
 for (int iTwo=0; iTwo<nTwo; iTwo++)     // generate data for second class
  {
  rr = 2.0*nextRandom()-1.0;
  ss = 2.0*nextRandom()-1.0;
  dataClassTwo[0][iTwo] = -1.0 + rr*L1Two[0] + ss*L2Two[0];
  dataClassTwo[1][iTwo] =  0.0 + rr*L1Two[1] + ss*L2Two[1]; 
  }
// test printing
 if (1==2)
   {
   for (int iOne=0; iOne<nOne; iOne++)   
     cout <<  dataClassOne[0][iOne] << " " << dataClassOne[1][iOne] << endl;
   cout << endl;
   for (int iTwo=0; iTwo<nTwo; iTwo++)   
     cout <<  dataClassTwo[0][iTwo] << " " << dataClassTwo[1][iTwo] << endl;
   }
}    // end of generateClassificationData()

// ***
// *** calculate the covariance matrices
// ***
template<int nOne, int nTwo>
void calcCoVarMatrices(double dataClassOne[2][nOne], double dataClassTwo[2][nTwo],
                       double coVarOne[2][2], double coVarTwo[2][2],
                       double mOne[2], double mTwo[2], 
                       double S[2][2], double deltaM[2])
{
 mOne[0] = 0;
 mOne[1] = 0;
 mTwo[0] = 0;
 mTwo[1] = 0;
//
 for (int iOne=0; iOne<nOne; iOne++)   
   {
   mOne[0] += dataClassOne[0][iOne]/nOne;
   mOne[1] += dataClassOne[1][iOne]/nOne;
   }
 for (int iTwo=0; iTwo<nTwo; iTwo++)   
   {
   mTwo[0] += dataClassTwo[0][iTwo]/nTwo;
   mTwo[1] += dataClassTwo[1][iTwo]/nTwo;
   }
//
   deltaM[0] = mTwo[0] - mOne[0];
   deltaM[1] = mTwo[1] - mOne[1];
//
 for (int iOne=0; iOne<nOne; iOne++)   
   for (int i=0; i<2; i++) 
     for (int j=0; j<2; j++) 
       coVarOne[i][j] += (dataClassOne[i][iOne]-mOne[i])*
                         (dataClassOne[j][iOne]-mOne[j])/nOne;
 for (int iTwo=0; iTwo<nTwo; iTwo++)   
   for (int i=0; i<2; i++) 
     for (int j=0; j<2; j++) 
       coVarTwo[i][j] += (dataClassTwo[i][iTwo]-mTwo[i])*
                         (dataClassTwo[j][iTwo]-mTwo[j])/nTwo;
//
 for (int i=0; i<2; i++) 
   for (int j=0; j<2; j++) 
     S[i][j] = coVarOne[i][j] + coVarTwo[i][j];
//
} // end of calcCoVarMatrices()

// ***
// *** evaluate the plane
// ***
void  evaluatePlane(double w[2], double dataPlane[2][2])
{
 double rr = sqrt(w[0]*w[0]+w[1]*w[1]);
 w[0] = w[0]/rr;
 w[1] = w[1]/rr;                   // normalization
//
 dataPlane[0][0] =  1.8*w[1];      // orthogonal
 dataPlane[1][0] = -1.8*w[0];
 dataPlane[0][1] = -dataPlane[0][0]; 
 dataPlane[1][1] = -dataPlane[1][0];
//
// test printing
 if (1==2)
   cout <<  "# w[0], w[1] : " << w[0] << " " << w[1] << endl;
} // end of evaluatePlane()

// ***
// *** support vector machine optimization
// ***
template<int N>
void svmGradientAscent(double svmData[2][N], double svmL[N], double svmA[N], 
                       double &svmB, double svmW[2], double svmDataPlane[2][2],
                       double svm_m1_Plane[2][2], double svm_m2_Plane[2][2])
{
 int nIter      =  400000;    // fixed number of update iterations
 double epsilon =    0.01;    // update rate
 double rr; 
//
 for (int iIter=0; iIter<nIter; iIter++)
   {
   svmW[0] = 0.0;
   svmW[1] = 0.0;
   for (int i=0; i<N; i++)
     for (int k=0; k<2; k++)
        svmW[k] += svmA[i]*svmL[i]*svmData[k][i];
//                                                           // (1) gradient
   for (int i=0; i<N; i++)
     {
     rr = svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i];
     svmA[i] += epsilon*(1.0-svmL[i]*rr);
     }
//                                                           // (2) orthogonalization
   rr = 0.0;
   for (int i=0; i<N; i++)
     rr += svmA[i]*svmL[i];
   for (int i=0; i<N; i++)
     svmA[i] = svmA[i] - rr*svmL[i]/N;
//                                                           // (3) positiveness
   for (int i=0; i<N; i++)
     if (svmA[i]<0.0)
       svmA[i] = 0.0;
   } // end of loop over iterations
//
//                                                           // offset
//
 svmB = 0.0;
 for (int i=0; i<N; i++)
   if (svmA[i]>0.001)
     {
     svmB = svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i] - svmL[i];
//
     cout << "# b  " <<  svmW[0]*svmData[0][i] + svmW[1]*svmData[1][i] - svmL[i] << endl;
     }
//                                                           // data plane
 rr = sqrt(svmW[0]*svmW[0]+svmW[1]*svmW[1]);
//                                         // w*b/|w|^2   on plane   w*x=b
 svmDataPlane[0][0] = svmB*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svmDataPlane[1][0] = svmB*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svmDataPlane[0][1] = svmB*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr; 
 svmDataPlane[1][1] = svmB*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//                                                           // margin planes
 svm_m1_Plane[0][0] = (svmB-1)*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svm_m1_Plane[1][0] = (svmB-1)*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svm_m1_Plane[0][1] = (svmB-1)*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr; 
 svm_m1_Plane[1][1] = (svmB-1)*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//
 svm_m2_Plane[0][0] = (svmB+1)*svmW[0]/(rr*rr) + 1.8*svmW[1]/rr;  
 svm_m2_Plane[1][0] = (svmB+1)*svmW[1]/(rr*rr) - 1.8*svmW[0]/rr;
 svm_m2_Plane[0][1] = (svmB+1)*svmW[0]/(rr*rr) - 1.8*svmW[1]/rr;
 svm_m2_Plane[1][1] = (svmB+1)*svmW[1]/(rr*rr) + 1.8*svmW[0]/rr;
//
//                                                           // test printout
//
 if (1==1)
   {
   printf("#%4s %5s %10s %12s\n","", "class", "a_i", "w*x_i-b");
   for (int i=0; i<N; i++)
     printf("#%4d %5d %10.4f %12.4f\n",i, int(svmL[i]), svmA[i], 
             svmData[0][i]*svmW[0]+svmData[1][i]*svmW[1] - svmB );   // support condition
   }
//
} // end of svmGradientAscent()

// ***
// *** main
// ***
int main() 
{
 const int N1 =   4;                 // number of training data per class
 const int N2 =   2;
 double dataClassOne[2][N1];         // class data points 
 double dataClassTwo[2][N2];
 double dataBasis[2][2];             // connecting the center of masses
//
 dataBasis[0][0] = -1.0;             // (-1,0)
 dataBasis[1][0] =  0.0;    
 dataBasis[0][1] =  1.0;             // (1,0)
 dataBasis[1][1] =  0.0;    
//
 double mOne[2];                     // center of masses
 double mTwo[2];                
 double deltaM[2];                   // mTwo-mOne
 double w[2];                        // feature vector
 double dataPlane[2][2];             // orthogonal to the feature vector
//
 double coVarOne[2][2];              // covariance matrices
 double coVarTwo[2][2];
 double        S[2][2];              // sum of covariance matrices
//
 generateClassificationData(dataClassOne,dataClassTwo); 
//
 calcCoVarMatrices(dataClassOne, dataClassTwo, coVarOne, coVarTwo,
                   mOne, mTwo, S, deltaM);               
//
 gaussElimination(S,deltaM,w);             // solve    "S deltaM = w"
 evaluatePlane(w, dataPlane);              // w defines the plane
// --- ------------------
// --- start code for SVM (support vector machine)
// --- ------------------
 const int N = N1+N2;       // combined 
 double svmData[2][N];      // combined training data
 double svmL[N];            // class labels
 double svmA[N];            // Lagrange parameters
 double svmB;               // hyperplane offset
 double svmW[2];            // weight vector
 double svmDataPlane[2][2]; // orthogonal to weight vector
 double svm_m1_Plane[2][2]; // margin one hyperplane
 double svm_m2_Plane[2][2]; // margin two hyperplane
//
 for (int i=0;i<N1;i++)     
   {
   svmData[0][i] = dataClassOne[0][i];  // merge data
   svmData[1][i] = dataClassOne[1][i]; 
   svmL[i]       = 1.0;
   svmA[i]       = nextRandom();        // (positive) random initialization
   }
 for (int j=0;j<N2;j++)     
   {
   svmData[0][N1+j] = dataClassTwo[0][j]; 
   svmData[1][N1+j] = dataClassTwo[1][j]; 
   svmL[N1+j]       = -1.0;
   svmA[N1+j]       = nextRandom();    
   }
 svmGradientAscent(svmData, svmL, svmA, svmB, svmW, svmDataPlane, svm_m1_Plane, svm_m2_Plane);
// --- ------------------
// ---  end  code for SVM 
// --- ------------------
 string dataNameOne      = "$dataOne";     
 string dataNameTwo      = "$dataTwo";                             
 string dataNamePlane    = "$dataPlane";                             
 string dataNamePlaneSVM = "$dataPlaneSVM";                             
 string dataNameP_m1_SVM = "$dataP_m1_SVM";                             
 string dataNameP_m2_SVM = "$dataP_m2_SVM";                             
 string dataNameBasis    = "$dataBasis";                             
 string dataToPlotOne      = xyValuesGnu(dataClassOne,dataNameOne);  // data to string
 string dataToPlotTwo      = xyValuesGnu(dataClassTwo,dataNameTwo);  
 string dataToPlotPlane    = xyValuesGnu(dataPlane,dataNamePlane);  
 string dataToPlotPlaneSVM = xyValuesGnu(svmDataPlane,dataNamePlaneSVM);  
 string dataToPlotP_m1_SVM = xyValuesGnu(svm_m1_Plane,dataNameP_m1_SVM);  
 string dataToPlotP_m2_SVM = xyValuesGnu(svm_m2_Plane,dataNameP_m2_SVM);  
 string dataToPlotBasis    = xyValuesGnu(dataBasis,dataNameBasis);  
//
//
 plotViaGnuplot(dataNameOne,     dataToPlotOne,
                dataNameTwo,     dataToPlotTwo, 
                dataNamePlane,   dataToPlotPlane,
                dataNamePlaneSVM,dataToPlotPlaneSVM,
                dataNameP_m1_SVM,dataToPlotP_m1_SVM,
                dataNameP_m2_SVM,dataToPlotP_m2_SVM,
                dataNameBasis,   dataToPlotBasis);
//
 return 1;
}  // end of main()