Taking the Coursera Deep Learning Specialization, **Convolutional Neural Networks** course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

# Table of Contents

# Deep Convolutional Models: Case Studies

## Learning Objectives

- Understand foundational papers of Convolutional Neural Networks (CNN)
- Analyze dymensionality reduction of a volume in a very deep network
- Understand and implement a residual network
- Build a deep neural network using Keras
- Implement skip-connection in your network
- Clone a repository from Github and use transfer learning

## Case Studies

### Why look at case studies

Good way to gain intuition about convolutional neural networks is to read existing architectures that utilize CNNs

**Classic Networks:**
- LeNet-5
- AlexNet
- VGG

**Modern Networks:**
- ResNet (152 layers)
- Inception Neural Network

### Classic Networks

**LeNet-5**

Goal was to recognize hand written images.

- Inputs were 32x32x1 (greyscale images.)
- Convolutional layer, 6 5x5 filters with stride of 1.
- Average Pooling with filter width 2, stride of 2.
- Convolutional Layer, 16 5x5 filters with a stride of 1.
- Average Pooling with filter width 2, stride of 2.
- Fully connected layer (120 nodes)
- Fully connected layer (84 nodes)
- Softmax layer (10 nodes)

**AlexNet**

- Inputs were 227x227x3
- 96 11x11 filters with stride of 4.
- Max pooling with 3x3 filter, stride of 2
- 5x5 same convolution
- Max pooling with 3x3filter, stride of 2.
- 3x3 same convolution
- 3x3 same convolution
- 3x3 same convolution
- Max Pooling with 3x3 filter, stride of 2.
- FC layer (9215 nodes)
- FC layer (4096 nodes)
- FC layer (4096 nodes)
- Softmax (1000 nodes)

**VGG-16**

Conv = 3x3filter, s=1, same Max-Pool = 2x2filter, s=2

- Inputs are 224x224x3
- Conv 64 x2
- Max-Pool
- Conv 128 x 2
- Max-Pool
- Conv 256 x 3
- Max-Pool
- Conv 512 x 3
- Max-Pool
- Conv 512 x 3
- Max-Pool
- FC layer (4096)
- FC layer (4096)
- Softmax (1000 nodes)

### Residual Networks (ResNets)

Allow activation layers from earlier in the network to skip additional layers.

Using residual blocks allow you to train much deeper networks.

### Why ResNets Work

If you make a network deeper, in a plain neural network you can hurt your ability to train your neural network. This is why residual blocks were invented.

Residual networks usually default to the identity function, so it doesn’t make the result worse. (usually can only get better)

Residual block usually have the same dimensions for shortcutting. Otherwise, a $W _s$ matrix needs to be applied.

### Networks in Networks and 1x1 Convolutions

Useful in adding non-linearity to your neural network without utilizing a FC layer (more computing).

### Inception Network Motivation

This is computationally expensive.

Computational complexity can be reduced by utilizing a 1x1 convolution

### Inception Network

Inception module takes the previous activation, then applies many convolution and pooling layers on it.

- Allows you to use the intermediate values in the network to make predictions (seems to have a regularization effect)

## Practical Advices for using ConvNets

### Using Open-Source Implementation

A lot of these neural networks are difficult to implement. Good thing there’s open source software!

Basically clone the git repo and follow the author’s instructions.

### Transfer Learning

Download weights that someone else has already trained and retrain it using your own dataset.

You can freeze earlier layers and only train the last few layers depending on your data set size.

- If your dataset is small, only train thefinal softmax layer
- If your dataset is medium, train the last few conv/fc layers
- If your dataset is large, unfreeze all layers, using them as initialization, train all layers

### Data Augmentation

- Common augmentation method is
**mirroring your dataset**. Preserves whatever you’re still trying to recognize in the picture. **Random cropping**so long as you crop the thing you’re looking for- Rotation
- Shearing
- Local Warping
- Color shifting

### State of Computer Vision

- Ensembling and 10-crop are not usually used for a practical system, but for competitions/benchmarking

Use Open Source Code! Contribute to open source as well.