YOLO (You Only Look Once) applies a single neural network to an entire image.
The network divides the image into a grid of 13×13 cells, from which five bounding boxes are predicted for each. Consequently, there are potentially 845 (13x13x5) separate bounding boxes.
With each predicted bounding box: a confidence score indicates how certain it is that the box actually encloses some object; and the class is predicted, such as a bicycle, person, dog, etc.
The confidence score and class prediction are combined for a final probability of the bounding box containing a particular classification.
A threshold can be set for the confidence score, which is 0.25 by default. Scores lower than this will not be kept in the final prediction.
YOLO is unique because: predictions are made with a single network evaluation, rather than many incremental regional evaluations; and, as the name suggests, it looks at the image only once, rather than sliding a small window across the image, and classifying many times.
The following links provide more robust explanations of the model:
- http://machinethink.net/blog/object-detection-with-yolo/
- http://cv-tricks.com/object-detection/faster-r-cnn-yolo-ssd/, which broadly explores object detection models: sliding window (R-CNN), SSD, YOLO
One thought on “Notes to Self on Using Yolo”
Comments are closed.