Taking the Coursera Deep Learning Specialization, **Neural Networks and Deep Learning** course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

Assumes you have knowledge of Week 2.

# Table of Contents

# Shallow Neural Networks

## Shallow Neural Network

### Neural Networks Overview

Recall that a neural network is very similar in the logistic regression problem defined last week. A Neural network is a stack of logistic regression calls chained together.

### Neural Network Representation

### Computing a Neural Network’s Output

In logistic regression, the output looks like this:

$$ z = w^Tx + b $$ $$ a = \sigma(z) $$

For a neural network, each layer is broken out into its respective nodes.

$$ z^{[1]}_1 = w^{[1]T}_1x + b^{[1]}_1 $$ $$ a_1^{[1]} = \sigma(x^{[1]}_1) $$

$$ a^{[1] \leftarrow \text{Layer} }_{i \leftarrow \text{Node in layer}} $$ $$ w^{[1]}_1 \leftarrow \text{is a vector} $$ $$ (w^{[1]})^T = w^{[1]T} \leftarrow \text{is a vector transposed} $$

### Vectorizing Across Multiple Examples

### Explanation for Vectorized Implementation

### Activation Functions

Tanh function may be a better activation function than sigmoid. Pretty the tanh function is almost always superior, except for the output layer.

If $ y \in {0, 1} $ the sigmoid function might be better for the output layer. For all other units, ReLU (rectified linear unit) is best, tanh function is better, sigmoid is worst.

Leaky ReLU might be better than ReLU for neural nets.

### Why do you need non-linear activation functions?

If you do not have non-linear activation functions, the calculation of $x \rightarrow \hat{y}$ is linear.

Linear activation functions eliminate the benefit of hidden layers, as the composite of two linear functions is a linear function.

### Derivatives of Activation Functions

### Gradient Descent for Neural Networks

Formula for computing derivatives in Neural Networks

### Backpropagation Intuition (Optional)

- didn’t watch

### Random Initialization

If you initialize all your weights to zero, your neural network won’t work because your hidden layer will effectively become a hidden node.

`W_layer1 = np.random.randn((2, 2)) * 0.01`

`b_layer1 = np.zero((2, 1))`

Move on to Week 4.