Notes to Self on Using Yolo

YOLO (You Only Look Once) applies a single neural network to an entire image.

The network divides the image into a grid of 13×13 cells, from which five bounding boxes are predicted for each. Consequently, there are potentially 845 (13x13x5) separate bounding boxes.

With each predicted bounding box: a confidence score indicates how certain it is that the box actually encloses some object; and the class is predicted, such as a bicycle, person, dog, etc.

The confidence score and class prediction are combined for a final probability of the bounding box containing a particular classification.

A threshold can be set for the confidence score, which is 0.25 by default. Scores lower than this will not be kept in the final prediction.

YOLO is unique because: predictions are made with a single network evaluation, rather than many incremental regional evaluations; and, as the name suggests, it looks at the image only once, rather than sliding a small window across the image, and classifying many times.

The following links provide more robust explanations of the model:

http://machinethink.net/blog/object-detection-with-yolo/
http://cv-tricks.com/object-detection/faster-r-cnn-yolo-ssd/, which broadly explores object detection models: sliding window (R-CNN), SSD, YOLO

One thought on “Notes to Self on Using Yolo”