Taking the Coursera Deep Learning Specialization, Convolutional Neural Networks course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

Table of Contents

Special Applications: Face Recognition & Neural Style Transfer

Face Recognition

What is face recognition?

Verification - Input image, name/ID - Output whether the input image is that of the claimed person

Recognition - Has a database of $K$ persons - Get an input image - Output ID if the image is any of the $K$ persons (or not recognized)

One Shot Learning

You need to be able to recognize a person given just one example of an individual’s face. Training samples are low, you may only have one picture of the faces you need to recognize.

Instead, need to learn a similarity function.


Siamese Network



Triplet Loss

Define and apply gradient descent on the triplet loss function.

Must compare pairs of pictures. In the terminology of triplet loss, there’s an Anchor image, Positive for match, Negative for mismatch.


The alpha is added so the trivial output of all zeros is punished.

$$ \mathscr{L}(A, P, N) = \max(||f(A)-f(P)||^2 - ||f(A)-f(N)||^2 + \alpha, 0) $$ $$ J = \sum\limits^m_{i=1} \mathscr{L}(A^{(i)}, P^{(i)}, N^{(i)}) $$


Choosing the triplets A,P,N should be difficult to distinguish to more effectively train the neural network.


Face Verification and Binary Classification

Instead of using triplet loss, you can use binary classification.

Compare pairs of pictures. Output is 1 if the pairs are of the same person, and output is 0 if the pairs are of different people.


In the siamese network, anchor faces can be pre-computed and stored rather than being computed from the image at runtime.

Neural Style Transfer

What is neural style transfer?

Neural style transfer is taking an image and applying the styles of other image onto it.


What are deep ConvNets learning?

Look at what is ‘activated’ by different layers in your neural network.


Earlier layers see less, but deeper layers see larger image patches.


Cost Function

Content image $C$, Style image $S$, goal is to generate a new image $G$

Cost function $J(G)$ needs to be defined. Need to check content and style.

$$ J(G) = \alpha J_{\text{Content}}(C, G) + \beta J_{\text{Style}}(S, G) $$

  1. Initiate the generated G randomly.
  2. Use gradient descent to minimize $J(G)$.


Content Cost Function


Style Cost Function


Correlation tells us which high level texture components occur together (or not together) in the image.



1D and 3D Generalizations