| removing | : | weak links; $|w_{ij}|$ well below average |
| reduces | : | network complexity; overfitting |
| whitening | : | covariance matrix $\to$ identity matrix |
| : | all data equally relevant |
| $\to$ | hierarchical feature representation by hidden nodes |
|
|
|
|
|
| autoencoder | restricted Boltzmann machine | recurrent network | convolution network |
| feedforward | undirected | recurrent | hierarchical feedforward |
$\qquad$
|
|
|
|
| + 0.007 x |
| = |
|
| "panda" | "nematode" | "gibbon" | ||
| 57.7% confidence | 8.2% confidence | 99.3 % confidence |
| original | tempered |
|
|
#!/usr/bin/env python3
import torch # PyTorch needs to be installed
dim = 2
eps = 0.1
x = torch.ones(dim, requires_grad=True) # leaf of computational graph
print("x : ",x)
print("x : ",x.data)
y = x + 2
out = torch.dot(y,y) # scalar product
print("y : ",y)
print("out : ",out)
print()
out.backward() # backward pass --> gradients
print("x.grad : ",x.grad)
with torch.no_grad(): # detach from computational graph
x -= eps*x.grad # updating parameter tensor
x.grad = None # flush
print("x : ",x.data)
torch.dot(x+2,x+2).backward()
print("x.grad : ",x.grad)