Machine Learning Primer -- Part III: Advanced Topics

Claudius Gros, WS 2024/25

Institut für theoretische Physik
Goethe-University Frankfurt a.M.

Reservoir Computing

echo state networks

[quantumComputingInc]

reservoir
recurrent network with fixed weights
:: weights tuned, but not learned
input weights fixed, not learned
output weights learned
→ linear classifier
single-layer perceptron → always converges

reservoir generates a palette of non-linear
transformations of present/past input activities
→ output selects

naming
:: reservoir computing (e.g. spiking neurons)
:: echo state networks (reservoir produces 'echos')

echo state code

time prediction task
:: one input/output
:: practical/robust, commerically used
torch.rand_like(W)
:: matrix mit random numbers $\ [0,1]$ :: same shape as W
A @ B short for torch.matmul(A,B)
sparse matrix: not all-to-all
overall scaling of echo state synaptic weight matrix
via maximal eigenvalue
→ spectral radius

#!/usr/bin/env python3
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

def target_function(n):
  "to be reproduced"
  series = np.zeros(n)
  for i in range(n):
    x = i*200.0/n
    series[i] = np.sin(x) + np.cos(0.3*(x+np.sin(1.1*x)))
  return series


class EchoStateNetwork:
  "Echo State Network class"
  def __init__(self, input_size, reservoir_size,
               output_size, spectral_radius=0.9,
               sparsity=0.1):
    self.reservoir_size  = reservoir_size
    self.spectral_radius = spectral_radius

# input weights
    self.Win = torch.randn(reservoir_size, input_size)*0.1

# sparse reservoir weights
    W = torch.randn(reservoir_size, reservoir_size)
    W[torch.rand_like(W) > sparsity] = 0.0

# scaling reservoir weights to set spectral radius
    eigenvalues = torch.linalg.eigvals(W).abs()
    W *= spectral_radius / eigenvalues.max()
    self.W = W

# output weights (initialized later during training)
    self.Wout = torch.randn(reservoir_size + 1,
                output_size, requires_grad=True)

  def forward(self, input_series):
    """running the ESN for the entire input series, 
       collects the reservoir states"""
    states = []
    state = torch.zeros(self.reservoir_size)

    for u in input_series:      # @: matrix multiplication
      state = torch.tanh(self.Win@u + self.W@state)
      states.append(state)
    return torch.stack(states)  # list to tensor

  def train(self, input_series, target_series,
            learning_rate=5e-3, epochs=5000):
    """ reservoir weights do not change 
        --> reservoir states do not change
        --> reservoir states can be evoulated
            before training
    """
    states = self.forward(input_series)
# adding bias 
    states_with_bias =\
       torch.cat([states, torch.ones(states.shape[0], 1)],
                 dim=1)

# instantiate optimizer / loss function
    optimizer = torch.optim.SGD([self.Wout],
                lr=learning_rate)
    loss_fn = nn.MSELoss()

# optimizing output weight
    for epoch in range(epochs):
      optimizer.zero_grad()
      predictions = states_with_bias@self.Wout
      loss = loss_fn(predictions, target_series)
      loss.backward()
      optimizer.step()
      if (epoch+1)%100==0:
        print(f'Epoch {epoch+1}/{epochs},', end="")
        print(f' Loss {loss.item():9.5f}')

  def predict(self, input_series):
    states = self.forward(input_series)
    states_with_bias = torch.cat([states,
       torch.ones(states.shape[0], 1)], dim=1)
    return states_with_bias @ self.Wout

# generating time series train/test data
data = target_function(2000)
train_data, test_data = data[:1500], data[1500:]

# preparing input and target series for the ESN
train_input  = torch.tensor(train_data[:-1],
               dtype=torch.float32).view(-1, 1)
train_target = torch.tensor(train_data[1:],
               dtype=torch.float32).view(-1, 1)
test_input   = torch.tensor(test_data[:-1],
               dtype=torch.float32).view(-1, 1)
test_target  = torch.tensor(test_data[1:],
               dtype=torch.float32).view(-1, 1)

# initialize and train the ESN
esn = EchoStateNetwork(input_size=1,
                       reservoir_size=500,
                       output_size=1)
esn.train(train_input, train_target)

# predictions/performance for test data
predictions = esn.predict(test_input)
mse = nn.MSELoss()(predictions, test_target)
print(f"\nmean squared test error: {mse.item()}")

# plotting results
plt.figure(figsize=(12, 6))
plt.plot(test_target.numpy(), label="true")
plt.plot(predictions.detach().numpy(), label="predicted")
plt.legend()
plt.title("ESN inference")
plt.show()

#!/usr/bin/env python3
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

def target_function(n):
  "to be reproduced"
  series = np.zeros(n)
  for i in range(n):
    x = i*200.0/n
    series[i] = np.sin(x) + np.cos(0.3*(x+np.sin(1.1*x)))
  return series

class EchoStateNetwork:
  "Echo State Network class"
  def __init__(self, input_size, reservoir_size,
               output_size, spectral_radius=0.9,
               sparsity=0.1):
    self.reservoir_size  = reservoir_size
    self.spectral_radius = spectral_radius

# input weights
    self.Win = torch.randn(reservoir_size, input_size)*0.1

# sparse reservoir weights
    W = torch.randn(reservoir_size, reservoir_size)
    W[torch.rand_like(W) > sparsity] = 0.0

# scaling reservoir weights to set spectral radius
    eigenvalues = torch.linalg.eigvals(W).abs()
    W *= spectral_radius / eigenvalues.max()
    self.W = W

# output weights (initialized later during training)
    self.Wout = torch.randn(reservoir_size + 1,
                output_size, requires_grad=True)

def forward(self, input_series):
    """running the ESN for the entire input series, 
       collects the reservoir states"""
    states = []
    state = torch.zeros(self.reservoir_size)

for u in input_series:      # @: matrix multiplication
      state = torch.tanh(self.Win@u + self.W@state)
      states.append(state)
    return torch.stack(states)  # list to tensor

def train(self, input_series, target_series,
            learning_rate=5e-3, epochs=5000):
    """ reservoir weights do not change 
        --> reservoir states do not change
        --> reservoir states can be evoulated
            before training
    """
    states = self.forward(input_series)
# adding bias 
    states_with_bias =\
       torch.cat([states, torch.ones(states.shape[0], 1)],
                 dim=1)

# instantiate optimizer / loss function
    optimizer = torch.optim.SGD([self.Wout],
                lr=learning_rate)
    loss_fn = nn.MSELoss()

# optimizing output weight
    for epoch in range(epochs):
      optimizer.zero_grad()
      predictions = states_with_bias@self.Wout
      loss = loss_fn(predictions, target_series)
      loss.backward()
      optimizer.step()
      if (epoch+1)%100==0:
        print(f'Epoch {epoch+1}/{epochs},', end="")
        print(f' Loss {loss.item():9.5f}')

def predict(self, input_series):
    states = self.forward(input_series)
    states_with_bias = torch.cat([states,
       torch.ones(states.shape[0], 1)], dim=1)
    return states_with_bias @ self.Wout

# generating time series train/test data
data = target_function(2000)
train_data, test_data = data[:1500], data[1500:]

# preparing input and target series for the ESN
train_input  = torch.tensor(train_data[:-1],
               dtype=torch.float32).view(-1, 1)
train_target = torch.tensor(train_data[1:],
               dtype=torch.float32).view(-1, 1)
test_input   = torch.tensor(test_data[:-1],
               dtype=torch.float32).view(-1, 1)
test_target  = torch.tensor(test_data[1:],
               dtype=torch.float32).view(-1, 1)

# initialize and train the ESN
esn = EchoStateNetwork(input_size=1,
                       reservoir_size=500,
                       output_size=1)
esn.train(train_input, train_target)

# predictions/performance for test data
predictions = esn.predict(test_input)
mse = nn.MSELoss()(predictions, test_target)
print(f"\nmean squared test error: {mse.item()}")

# plotting results
plt.figure(figsize=(12, 6))
plt.plot(test_target.numpy(), label="true")
plt.plot(predictions.detach().numpy(), label="predicted")
plt.legend()
plt.title("ESN inference")
plt.show()

random matrix theory

$N\times N\ $ matrices with random entries
mean zero, variance $\ \sigma^2$
circular law of random matrix theory

the eigenvalues are uniformly distributed in the
complex plane on a disk with radius $\ \sigma\sqrt{N}$

spectral radius $\ \sigma\sqrt{N}$

elliptic matrices

positive/negative correlations between
off-diagonal matrix elements

$\displaystyle\quad\quad \Gamma = \frac{\sum_{i,j} \big(w_{ij}-\mu_w\big) \big(w_{ji}-\mu_w\big)} {\sum_{i,j} \big(w_{ij}-\mu_w\big)^2} $

limiting cases
$\Gamma=\phantom{-}0$ : fully uncorrelated
:: complex eigenvalues
$\Gamma=\phantom{-}1$ : symmetric matrix, $w_{ij}=w_{ji}$
:: real spectrum
$\Gamma=-1$ : antisymmetric matrix, $w_{ij}=-w_{ji}$
:: imaginary eigenvalues

critical recurrent networks

isolated linear network (small activities)
:: matrix $\ \hat{W}$
:: spectral radius $\ R_w$
average increase/decrease of neural activity $\ \mathbf{y}_t$

$$ \mathbf{y}_{t+1} = \hat{W}\, \mathbf{y}_t \quad\qquad \fbox{$\phantom{\big|} \sigma_{t+1} \approx R_w\,\sigma_t \phantom{\big|}$} $$

subcritical $\ R_w<1$
:: activity decreases exponentially
critical $\ R_w=1$
:: constant activity (approximatively)
super-critical $\ R_w>1$
:: runaway activity (limited by the non-linearity of the transfer function)
→ chaotic dynamics

absorbing phase transitions

absorbing state 'absorbs' trajectories
e.g. of a markovian process
isolated recurrent network
spectral radius $\ R_w$
- activity dies out for $\ R_w<1$
- continuously active for $\ R_w>1$

critical brain hypothesis

information processing best for critical reservoirs
:: in pratice: slightly subcritical
possible regime for the resting state of our brains
for a discussion see, e.g.
C. Gros
A devil's advocate view on 'self-organized' brain criticality
Journal of Physics: Complexity 2, 031001 (2021)

variance mean-field theory

mean-field theory
units are statistically independent (e.g. magnetic moments)
subject only to the cummulative influence of other units (e.g. magnetic fields)

mean-field theory for recurrent neural nets

core quantity is the variance of neural activities (not the mean)

$\sigma_y^2\ $ variance of network units
$\sigma_{ext}^2\ $ variance of input
$R_w\ $ spectral weight of recurrent network matrix

$$ \fbox{$\phantom{\big|} 2R_w^2\sigma_y^2\big(1-\sigma_y^2\big)^2 = 1 - \big(1+2\sigma_{\rm ext}^2\big) \big(1-\sigma_y^2\big)^2 \phantom{\big|}$} $$

Schubert & Gros
Local homeostatic regulation of the spectral radius of echo-state networks
Frontiers In Computational Neuroscience 24, 587721 (2021)