A neural network is the foundation of many deep learning applications. Moreover in all of the deep learning projects we simply use the NN models present in famous packages such as Tensorflow and Keras and therefore we observe the superficial working of the model. Through this notebook i would like to highlight the important procedures that are involved in a neural network
The dataset utilised in the notebook is the MNIST dataset , a famous computer vision dataset . The link for the dataset is given as follows : dataset
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.
Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.
Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive.
Our NN will have a simple two-layer architecture. Input layer π[0] will have 784 units corresponding to the 784 pixels in each 28x28 input image. A hidden layer π[1] will have 10 units with ReLU activation, and finally our output layer π[2] will have 10 units corresponding to the ten digit classes with softmax activation.
Forward propagation
π[1]=π[1]π+π[1]
π΄[1]=πReLU(π[1]))
π[2]=π[2]π΄[1]+π[2]
π΄[2]=πsoftmax(π[2])
Backward propagation
ππ[2]=π΄[2]βπ
ππ[2]=1πππ[2]π΄[1]π
ππ΅[2]=1πΞ£ππ[2]
ππ[1]=π[2]πππ[2].βπ[1]β²(π§[1])
ππ[1]=1πππ[1]π΄[0]π
ππ΅[1]=1πΞ£ππ[1] Parameter updates
π[2]:=π[2]βπΌππ[2]
π[2]:=π[2]βπΌππ[2]
π[1]:=π[1]βπΌππ[1]
π[1]:=π[1]βπΌππ[1]
Vars and shapes
- Forward prop
π΄[0]=π : 784 x m
π[1]βΌπ΄[1] : 10 x m
π[1]: 10 x 784 (as π[1]π΄[0]βΌπ[1])
π΅[1]: 10 x 1
π[2]βΌπ΄[2]: 10 x m
π[1]: 10 x 10 (as π[2]π΄[1]βΌπ[2])
π΅[2]: 10 x 1
- Backprop
ππ[2] : 10 x m (π΄[2])
ππ[2]: 10 x 10
ππ΅[2]: 10 x 1
ππ[1]: 10 x m ( π΄[1])
ππ[1]: 10 x 10
ππ΅[1]: 10 x 1