dhruv's space

Summary Notes: Forward and Back Propagation

I recently completed the first course offered by deeplearning.ai, and found it incredibly educational. Going forwards, I want to keep a summary of the stuff I learn (for my future reference) in the form of notes like this. This one is for forward and back-prop intuitions.

Setup

A neural network with $L$ layers.

Neural Network

Notation:

Forward Propagation Intuition (for batch gradient descent)

Forward prop will simply take in inputs from layer $a^{[l-1]}$, calculate linear and non-linear activations based on its weights and biases, and propagate them to the next layer.

For layer $l$, forward prop function:

Takes in inputs

Calculates

Steps

$$ \large z^{[l]} = w^{[l]}.a^{[l-1]} + b^{[l]} $$

$$ \large a^{[l]} = g^{[l]}(z^{[l]})$$

(where $g^{[l]}$ is the activation function for that layer, eg. relu, tanh, sigmoid)

Backpropagation Intuition (for batch gradient descent)

Back-prop calculates gradients of the parameters of each layer wrt to the cost function, moving from right to left.

For layer $l$, back-prop function:

Takes in inputs

Calculates

Steps

$$ \large \mathcal{L} = -y\log\left(a^{[L]}\right) + (1-y)\log\left(1- a^{[L]}\right) $$ $$ \large da^{[L]} = -(\frac{y}{a^{[L]}} + \frac{(1-y)}{1- a^{[L]}}) $$

$$ \large dz^{[l]} = da^{[l]} * g’(z^{[l]})$$

(where $g^{[l]}$ is the activation function for that layer)

$$ \large dw^{[l]} = \frac{\partial \mathcal{L} }{\partial z^{[l]}} . \frac{\partial z^{[l]}}{\partial w^{[l]}} = dz^{[l]} a^{[l-1]} $$

$$ \large db^{[l]} = \frac{\partial \mathcal{L} }{\partial z^{[l]}} . \frac{\partial z^{[l]}}{\partial b^{[l]}} = dz^{[l] (i)}$$

$$ \large da^{[l-1]} = \frac{\partial \mathcal{L} }{\partial z^{[l]}} . \frac{\partial z^{[l]}}{\partial a^{[l-1]}} = w^{[l]} dz^{[l]} $$

Summary in simple words

Process

Start from the input layer and compute activations for each layer. At the last layer, the activation(s) will be the predictions of the neural network. Compute loss. Calculate gradients of the linear activations for each layer, which will then be used to calculate gradients of weights and biases for each layer. Update the parameters after each such walkthrough.

References

#Deep-Learning