It Keeps Thinking It’s A Truck is a stream of images from Times Square taken at 1 minute intervals. Using the YOLO Machine Learning model (You Only Look Once), objects within the image are localized, identified and classified — for better or worse.
https://vimeo.com/260324290
As the images scroll by errors in the predictions become evident. The unchanging ticket booth is periodically identified as a truck, fences are benches, and when seated, people are sometimes classified as fire hydrants. Can we see how a statistical model “sees”, that is, can see a truck where there is actually a booth? How will the built environment change to accommodate computer vision?
People are magenta, trucks are green, cars are yellow.
Technicals
The Times Square dataset was collected over the course of a week from a live streaming camera, as detailed in the Routine Grid post. The identification of objects within each image is achieved by using Yolo, a machine learning model for object detection. For each image, Yolo produces a corresponding image with the bounding boxes and labels of identified objects. A explanation of Yolo is detailed in “Notes to Self on Using Yolo“.
A bash script is executed to perform detection on an entire folder of images. However, the script is inefficient, as it loads the individually model for each image.
Next Steps
- Rather than output an image, save the predictions and bounding box coordinates to a text file for dynamic use elsewhere, such as on the web.
- Cut out bounding boxes from the image rather than overlaying them on the original image.