Taking the Coursera Deep Learning Specialization, Neural Networks and Deep Learning course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

Assumes you have knowledge of Week 3.

Table of Contents

Deep Neural Networks

Deep Neural Network

Deep L-layer neural network

deep_neural_networks

deep_neural_network_notation

Capital $L$ denotes the number of layers in the network. $ L = 4 $

We use $n^{[l]}$ to denote number of units in layer $l$.

$$ n^{[0]} = n_x = 3, n^{[1]} = 5, n^{[2]} = 5, n^{[3]} = 3, n^{[4]} = 1, n^{[5]} = 1 $$

Forward Propagation in a Deep Network

deep_neural_network_forward_propagation

$$ Z^{[l]} = W^{[l]} a ^{[l-1]} + b^{[l]} $$ $$ a^{[l]} = g^{[l]}(Z^{[l]}) $$

Vectorized:

$$ X = A^{[0]} $$ $$ Z^{[1]} = W^{[1]} X + b^{[l]} $$ $$ A^{[1]} = g^{[1]}(Z^{[1]}) $$

Getting your matrix dimensions right

parameters_wl_and_bl

$$ W^{[1]} : (n^{[1]}, n^{[0]}) $$

$$ W^{[l]} : (n^{[l]}, n^{[l-1]}) $$

The shape of $b$ should be $b^{[l]} : (n^{[l]}, 1) $.

vectorized_matrix_dimensions

Why deep representations?

intuition_about_deep_representation

Composing functions of increasing complexity, ie consider a face classifier - detect edges -> detect eyes, or noses -> detect groupings of eyes and noses

Circuit theory and deep learning:

Informally: There are functions you can compute with a “small” L-layer deep neural network that shallower networks require exponentially more hidden units to compute.

Building blocks of deep neural networks

forwards_and_backwards_functions Z is cached and used in both forward and back propagation.

building_blocks_of_deep_neural_networks

Forward and Backward Propagation

Forward propagation

$$ z^{[l]} = w^{[l]} z^{[l-1]} + b^{[l]} $$ $$ a^{[l]} = g^{[l]}(z^{[l]}) $$

Vectorized

$$ Z^{[l]} = W^{[l]} A^{[l-1]} = b^{[l]} $$ $$ A^{[l]} = g^{[l]} (Z^{[l]}) $$

Back propagation

$$ dz^{[l]} = da^{[l]} \times g^{[l]}‘(z^{[l]}) $$ $$ dW^{[l]} = dz^{[l]} \times a^{[l-1]} $$ $$ db^{[l]} = dz^{[l]} $$ $$ dz^{[l-1]} = w^{[l]T} \times dz^{[l]} $$

backpropagation_summary

$$ da^{[l]} = -\dfrac{y}{a} + \dfrac{(1-y)}{(1-a)} $$

Parameters vs Hyperparameters

Parameters $ W^{[1]}, b^{[1]}, W^{[2]}, b^{[2]}, \dots $

Hyperparameters:

Later hyperparameters

Applied deep learning is a very empirical process.

Idea -> Code -> Experiment
<- Repeat <-

What does this have to do with the brain?

forward_and_backpropagation

Less like brain, more like universal function approximator.