Taking the Coursera Deep Learning Specialization, Convolutional Neural Networks course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

Table of Contents

Deep Convolutional Models: Case Studies

Learning Objectives

Case Studies

Why look at case studies

Good way to gain intuition about convolutional neural networks is to read existing architectures that utilize CNNs

Classic Networks: - LeNet-5 - AlexNet - VGG

Modern Networks: - ResNet (152 layers) - Inception Neural Network

Classic Networks



Goal was to recognize hand written images.

  1. Inputs were 32x32x1 (greyscale images.)
  2. Convolutional layer, 6 5x5 filters with stride of 1.
  3. Average Pooling with filter width 2, stride of 2.
  4. Convolutional Layer, 16 5x5 filters with a stride of 1.
  5. Average Pooling with filter width 2, stride of 2.
  6. Fully connected layer (120 nodes)
  7. Fully connected layer (84 nodes)
  8. Softmax layer (10 nodes)



  1. Inputs were 227x227x3
  2. 96 11x11 filters with stride of 4.
  3. Max pooling with 3x3 filter, stride of 2
  4. 5x5 same convolution
  5. Max pooling with 3x3filter, stride of 2.
  6. 3x3 same convolution
  7. 3x3 same convolution
  8. 3x3 same convolution
  9. Max Pooling with 3x3 filter, stride of 2.
  10. FC layer (9215 nodes)
  11. FC layer (4096 nodes)
  12. FC layer (4096 nodes)
  13. Softmax (1000 nodes)


Conv = 3x3filter, s=1, same Max-Pool = 2x2filter, s=2


  1. Inputs are 224x224x3
  2. Conv 64 x2
  3. Max-Pool
  4. Conv 128 x 2
  5. Max-Pool
  6. Conv 256 x 3
  7. Max-Pool
  8. Conv 512 x 3
  9. Max-Pool
  10. Conv 512 x 3
  11. Max-Pool
  12. FC layer (4096)
  13. FC layer (4096)
  14. Softmax (1000 nodes)

Residual Networks (ResNets)


Allow activation layers from earlier in the network to skip additional layers.

Using residual blocks allow you to train much deeper networks.


Why ResNets Work

If you make a network deeper, in a plain neural network you can hurt your ability to train your neural network. This is why residual blocks were invented.

Residual networks usually default to the identity function, so it doesn’t make the result worse. (usually can only get better)


Residual block usually have the same dimensions for shortcutting. Otherwise, a $W _s$ matrix needs to be applied. res_net_example

Networks in Networks and 1x1 Convolutions



Useful in adding non-linearity to your neural network without utilizing a FC layer (more computing).

Inception Network Motivation


This is computationally expensive.


Computational complexity can be reduced by utilizing a 1x1 convolution


Inception Network

Inception module takes the previous activation, then applies many convolution and pooling layers on it.



Practical Advices for using ConvNets

Using Open-Source Implementation

A lot of these neural networks are difficult to implement. Good thing there’s open source software!

Basically clone the git repo and follow the author’s instructions.

Transfer Learning

Download weights that someone else has already trained and retrain it using your own dataset.


You can freeze earlier layers and only train the last few layers depending on your data set size.

Data Augmentation


  1. Common augmentation method is mirroring your dataset. Preserves whatever you’re still trying to recognize in the picture.
  2. Random cropping so long as you crop the thing you’re looking for
  3. Rotation
  4. Shearing
  5. Local Warping
  6. Color shifting

State of Computer Vision


computer_vision_tips - Ensembling and 10-crop are not usually used for a practical system, but for competitions/benchmarking

Use Open Source Code! Contribute to open source as well.