Live Body-Context (063)

Live Body-Context uses machine learning to identify and isolate people from a live video stream.

This map is a technical development on Body-Context (047), which uses a computer vision algorithm, Single Shot Multibox Detector with MobileNet, for object recognition from live video. Similar to Body-Context (047), the isolation of the figure from context, and vice versa, illustrates how each give meaning to the other.

Here, the figures do not respond to the changing representation. Whether they are isolated or removed from the context, their performance continues unchanged.

Unlike Body-Context (047), figures are not silhouetted but cropped with rectangular bounding boxes. When isolated, the immediate context still surrounds them, offering clues to their actions. However, when they are removed from the context, they aren’t recognizable: the rectangle could be obscuring anything.

Technicals

The map was built using a python server running the TensorFlow model available as an internal API and processing the WebRTC webcam stream. It builds on a tutorial by Chad Hart on webrtcH4cKS, “Computer Vision on the Web with WebRTC and Tensorflow”, which walks through the Google Object Recognition API and connecting it to a webcam and server.

Next Steps
  • Explore how different levels of awareness changes the performance — how does someone act when they are only a white rectangle representation? How do they act without a context to act within?
  • Explore using the segmentation addition to the API
  • Use a peer server provided by PeerJS to connect many webcams

If They Only Knew (062)

If They Only Knew composites a tracked body from one space into the live feed of another space.

The composited body interacts not with their own environment, but a distant one. Yet, the interaction is not reciprocal: the distant context is unaware that it is being streamed. With only one half of the composite aware of the other, can one be said to “interact” with a context doesn’t engage? Or is the streamed representation of the context the “context” in which one engages, prompting new actions and practices to emerge? How does the physical context of the tracked body, hidden from view, continue to shape performance?

Technicals

A custom server used WebRTC to send video data to the browser through a peer-to-peer connection. A Kinect located in one space sent a cut-out body, while a webcam streamed a live video feed.

Comparing Clusters (061)

Comparing Clusters illustrates how the same dataset can form different clusters when run through various algorithms.

All the blocks in the Bronx are ordered by four different clustering algorithms: kMeans, GMM, agglomeration clustering, and affinity propagation. Clicking on a block in one representation isolates the block’s corresponding cluster in each of the other mappings, and sets these clusters to the same random color. As there isn’t a one-to-one translation between clusters in each algorithm, the change in color allows users to incrementally construct a color scheme particular to the blocks they click.

Multiple representations challenge the authority and determination of each algorithm. Furthermore, through comparison, the various interpretations and parameters for similarity are made evident.

Technicals

The map is displayed using d3.js using the geoMercator projection, with each algorithm being drawn to an individual canvas.

The shape descriptors developed in Looking for The Same (058) are used as the input data for each algorithm. When required by the algorithm, the number of clusters specified was ten. The scikit python library was used for each: kmeans, gaussian mixture modeling, agglomerative hierarchical clustering, and affinity propagation.

K-Means uses a centroid model of clustering in which similarity is derived by distance of data point to a set number of centroids. With each iteration, data points are assigned to the nearest centroid. Then, centroids take the average position of the data points assigned to them. This is repeated until the centroids reach stability.

Gaussian Mixture Modeling is a probabilistic distribution model using the probability of all data points in a cluster belonging to the same distribution.

Agglomerative Hierarchical Clustering uses distance between data points to identify similarity. Data points are initially classified into separate clusters and then merges as distance decreases between data points.

Affinity Propagation is a type of “message passing” model which finds exemplars, as representatives of clusters, within the data set. The number of clusters is not required to be specified and all data points are potential exemplars. Messages are exchanged between pairs of data points until a set of exemplars and corresponding clusters emerge. Aneesha Bakharia provides a nice write-up on affinity propagation.

Routine Tracking (060)

Routine Tracking uses object detection to track people in a video of Times Square.

The first of two representations isolates tracked bodies from RGB video frames. As time passes and people move across each frame, the tracked images persist and blur together. The resulting trails of movement remain only until another figure crosses the path, effectively over-writing that history. Moving figures consume much of the scene, but people who are sitting still or standing remain in front of the paths. As more people are tracked, the context of Times Square is slowly revealed.

In the second representation, people are shown as blocks of color, without a trail of their past position. They flicker in and out, changing color as the tracking algorithm identifies them as a ‘new’ person. Having seen the RGB representation prior, and understanding the movement as walking, we recognize the discrepancy between our identification of the “same” person and that identified by the algorithm. Yet the abstract representation also reveals information: although tracking the same person, the rectangle shape wiggles and changes. Even figures that are seemingly stationary are shown as vibrating rectangles. Form stretches to encompass the changing stride and gait of the tracked body.

Technicals

Object tracking is achieved using darkflow and builds on YOLO’s object detection algorithm. Tracking was executed with python to produce a CSV file with tracking ids and coordinates for each frame. Using P5.js, data was represented frame-by-frame as colored rectangles or extracted from the corresponding video frame. Rather than use the original video in conjunction with P5.js, each frame was saved as an image that could be loaded independently.

Looking for the Same, Part 2 (059)

Looking for the Same, Part 2 compares two distributions of the same city blocks to ask: is identification more meaningful when the blocks are situated geographically or by similarity?

The map expands on the previous through interaction, where the number of clusters, color coding, and perplexity have been parameterized. But more importantly, distribution is also configurable. City blocks are either geolocated or plotted as a t-SNE visualization.

The t-SNE visualization, color-coding by cluster, helps overcome misleading spatial relationships: even though shapes are related, they may not be side-by-side. The t-SNE color-codes by neighborhood but locates by similarity, while the map-based visualization color codes by similarity and locates by geography. As such, each layout privileges a particular reading or legibility of the same information.

How does the similarity identified by an algorithm compare to our anticipation of similarity? Does it help us see relationships we overlooked or does it reinforce our imagining of similarity?

Next Steps
  • Include parameters to change type of clustering algorithm (GMM, affinity propagation, agglomerative clustering, etc.)
  • Compare different dimensionality reduction techniques.