Convolutional Neural Network See also: this one assignment on CNN
how can we exploit sparsity and locality?
think of sparse connectivity rather than full connectivity
where we exploiting invariance, it might be useful in other parts of the image as well
convolution
accept volume of size W 1 × H 1 × D 1 W_1 \times H_1 \times D_1 W 1 × H 1 × D 1 with four hyperparameters
filters K K K
spatial extent F F F
stride S S S
amount of zero padding P P P
produces a volume of size W 2 × H 2 × D 2 W_2 \times H_2 \times D_2 W 2 × H 2 × D 2 where:
W 2 = W 1 − F + 2 P S + 1 W_2 = \frac{W_1 - F + 2P}{S} + 1 W 2 = S W 1 − F + 2 P + 1
H 2 = H 1 − F + 2 P S + 1 H_2 = \frac{H_1 - F + 2P}{S} + 1 H 2 = S H 1 − F + 2 P + 1
D 2 = K D_2 = K D 2 = K
1D convolution:
y = ( x ∗ w ) y ( i ) = ∑ t x ( t ) w ( i − t ) \begin{aligned}
y &= (x*w) \\
y(i) &= \sum_{t}x(t)w(i-t)
\end{aligned} y y ( i ) = ( x ∗ w ) = t ∑ x ( t ) w ( i − t )
2D convolution:
y = ( x ∗ w ) y ( i , j ) = ∑ t 1 ∑ t 2 x ( t 1 , t 2 ) w ( i − t 1 , j − t 2 ) \begin{aligned}
y &= (x*w) \\
y(i,j) &= \sum_{t_1} \sum_{t_2} x(t_1, t_2) w(i-t_1,j-t_2)
\end{aligned} y y ( i , j ) = ( x ∗ w ) = t 1 ∑ t 2 ∑ x ( t 1 , t 2 ) w ( i − t 1 , j − t 2 )
max pooling
idea to reduce number of parameters
batchnorm
x j = [ x 1 j , … , x d j ] x^{j} = [x_1^j,\ldots,x_d^j] x j = [ x 1 j , … , x d j ]
Batch X = [ ( x 1 ) T … ( x b ) T ] T X = [(x^1)^T \ldots (x^b)^T]^T X = [( x 1 ) T … ( x b ) T ] T