Taking the Coursera Deep Learning Specialization, Convolutional Neural Networks course. Will post condensed notes every week as part of the review process. All material originates from the free Coursera course, taught by Andrew Ng. See deeplearning.ai for more details.

# Object Detection

## Learning Objectives

• Understand the challenges of Object Localization, Object Detection, Landmark Finding
• Understand and implement non-max suppression
• Understand and implement intersection over union
• Understand how to label a dataset for an object detection application
• Remember the vocabulary of object detection (landmark, anchor, bounding box, grid)

## Detection Algorithms

### Object Localization

Image classification: One object (Is cat or no cat)

Classification with Localization: One object (is cat or not cat), bounding box over the object

Detection: Multiple objects, multiple bounding boxes.

In practice, you don’t have to use squared error. You can use different loss functions for different output values. (mean squared error for bounding box, logistic regression loss for $P_c$)

### Landmark Detection

Detection of certain points on your image.

Define the landmarks you want to detect in your training set, then set the output parameters in the neural network.

### Object Detection

Start with really closely cropped images. Given this labeled training set, train a convnet to return $y=\{0,1\}$.

Perform a sliding window detection with bounding boxes of increasing sizes.

This is extremely computationally expensive. Granularity, box size, computational cost, all needs to be taken into account. We can implement this better.

### Convolutional Implementation of Sliding Windows

You can implement sliding windows convolutionally. This algorithm has a weakness- bounding box predictions aren’t too accurate.

### Intersection Over Union

The higher the IoU is, the more ‘correct’ the bounding box. 0.5 is a human chosen convention.

### Non-max Suppression

One problem of object detection is the algorithm might detect a single object more than once.

Take the highest probability box from all the overlaps, then suppress the overlapped box with lower probability.

### Anchor Boxes

What if a grid cell wants to detect multiple objects?

Example here has two anchor boxes.

• This does not handle three objects in the same grid cell.
• This does not handle two same anchor box sizes in the same grid cell.

### (Optional) Region Proposals

R-CNN (Region Convolutional Neural Network). Run a segmentation algorithm first to determine what could be objects.