| removing | : | weak links; $|w_{ij}|$ well below average | 
| reduces | : | network complexity; overfitting | 
| whitening | : | covariance matrix $\to$ identity matrix | 
| : | all data equally relevant | 
| $\to$ | hierarchical feature representation by hidden nodes | 
|  |  |  |  | 
| autoencoder | restricted Boltzmann machine | recurrent network | convolution network | 
| feedforward | undirected | recurrent | hierarchical feedforward | 
 
|  $\qquad$ | 
 | 
|         |     | 
 
 
 
|   | + 0.007 x |   | = |   | 
| "panda" | "nematode" | "gibbon" | ||
| 57.7% confidence | 8.2% confidence | 99.3 % confidence | 
| original | tempered | 
|   |   | 
#!/usr/bin/env python3
import torch                     # PyTorch needs to be installed
dim = 2
eps = 0.1
x = torch.ones(dim, requires_grad=True)  # leaf of computational graph
print("x      : ",x)
print("x      : ",x.data)
y = x + 2
out = torch.dot(y,y)             # scalar product
print("y      : ",y)
print("out    : ",out)
print()
out.backward()                   # backward pass --> gradients
print("x.grad : ",x.grad)
with torch.no_grad():            # detach from computational graph
  x -= eps*x.grad                # updating parameter tensor 
  x.grad = None                  # flush
print("x      : ",x.data)
torch.dot(x+2,x+2).backward()
print("x.grad : ",x.grad)
