It Keeps Thinking It’s A Truck (053)

It Keeps Thinking It’s A Truck is a stream of images from Times Square taken at 1 minute intervals. Using the YOLO Machine Learning model (You Only Look Once), objects within the image are localized, identified and classified — for better or worse.

https://vimeo.com/260324290

As the images scroll by errors in the predictions become evident. The unchanging ticket booth is periodically identified as a truck, fences are benches, and when seated, people are sometimes classified as fire hydrants. Can we see how a statistical model “sees”, that is, can see a truck where there is actually a booth? How will the built environment change to accommodate computer vision?

People are magenta, trucks are green, cars are yellow.

Technicals

The Times Square dataset was collected over the course of a week from a live streaming camera, as detailed in the Routine Grid post. The identification of objects within each image is achieved by using Yolo, a machine learning model for object detection. For each image, Yolo produces a corresponding image with the bounding boxes and labels of identified objects. A explanation of Yolo is detailed in “Notes to Self on Using Yolo“.

A bash script is executed to perform detection on an entire folder of images. However, the script is inefficient, as it loads the individually model for each image.

Next Steps

Rather than output an image, save the predictions and bounding box coordinates to a text file for dynamic use elsewhere, such as on the web.
Cut out bounding boxes from the image rather than overlaying them on the original image.