Notes to Self on Using Yolo

YOLO (You Only Look Once) applies a single neural network to an entire image.
The network divides the image into a grid of 13×13 cells, from which five bounding boxes are predicted for each. Consequently, there are potentially 845 (13x13x5) separate bounding boxes.
With each predicted bounding box: a confidence score indicates how certain it is that the box actually encloses some object; and the class is predicted, such as a bicycle, person, dog, etc.
The confidence score and class prediction are combined for a final probability of the bounding box containing a particular classification.
A threshold can be set for the confidence score, which is 0.25 by default. Scores lower than this will not be kept in the final prediction.
YOLO is unique because: predictions are made with a single network evaluation, rather than many incremental regional evaluations; and, as the name suggests, it looks at the image only once, rather than sliding a small window across the image, and classifying many times.
The following links provide more robust explanations of the model:

One thought on “Notes to Self on Using Yolo

Comments are closed.