Looking for the Same (058)

Looking for the Same is a plot of all the city blocks in the Bronx, where on hover, relative density describes the relationships of block shapes within a neighborhood.

The map uses multi-dimensional image moments to describe each block’s shape, and then t-SNE, a dimensionality reduction algorithm, to visualize these shape descriptions onto a two-dimensional plane. On hover, blocks within a neighborhood are highlighted. However, blocks are situated by similarity rather than geolocation, allowing users to compare the difference or sameness within each neighborhood. Like any visualization of an algorithmic process, it also prompts a comparison of how an algorithm finds “related” forms to our own identifications.

Technicals

To transform the block shapes into a data set, each block was saved as a black and white image from which Zernike moments were calculated. Image moments are a way of describing an image through different weightings of pixel intensities. Besides Zernike moments, Hu moments are also a popular set of invariant moments used for shape description within images. Zernike moments are particularly sensitive to the scale and translation of objects within images, but are invariant to rotation.

t-SNE is a dimensionality reduction algorithm that tries to find a faithful representation of high-dimensional points in a lower-dimensional space. However, as discussed by Wattenberg, Viegas and Johnson in “How To Use t-SNE Effectively“, t-SNE plots can be misleading. Even though shapes may be related, they may not appear side by side. t-SNE is highly affected by the perplexity hyperparameter, which balances between local and global aspects of the dataset. This map iteration uses a perplexity of 30. Visualizations using other perplexity values are shown below.

Perplexity: 20
Perplexity: 8
Perplexity: 5
Next Steps
  • Color-coding by similarity rather than neighborhood
  • Test alternative dimensionality-reduction algorithms to compare layout

More Difference From The Same (057)

More Difference From The Same uses visually similar aerial imagery to construct a sequence of transitional spaces, always between one place and the next.

This map is part of a series exploring the role of context in recognizing similarity. As the aerial imagery of one place is slowly replaced by the imagery of another, the two places are compared but also merge together. What do these places have in common? Are tennis courts always surrounded by a tree line? Would the transitional places function in a similar way?

Using Terrapattern — a machine learning model for finding visual similarity in aerial imagery — a dataset of similar places was compiled. The second place is similar to the first, the third place is similar to the second, and so on.

Technicals

To display each place as a random sequence of tiles, three canvas HTML elements were used: a hidden webGL canvas used Mapbox to render the aerial tiles and jump to a different location every 5 seconds; a hidden 2D canvas onto which the webGL canvas is drawn as an image every 6.75 seconds; and finally, a display 2D canvas onto which subdivisions of the hidden canvas are drawn every 0.75 seconds. WebGL aerial tiles were drawn to a separate hidden canvas to account for the incremental loading of the tiles. If the aerial tiles were copied immediately to the display canvas, they wouldn’t be fully rendered. Having the location change at a different interval than the copy rate ensures each tile is fully rendered.

Previous blog posts (Back and Forth 036 and A Different Similar Center 030) discuss the Terrapattern model in more detail.

Compare To
  • Back and Forth (036), which replaced the center tile but also the surrounding context.
  • A Different Similar Center (030), which replaced only the center image and maintained the original context.

Together, We’re More Meaningful (056)

Together, We’re More Meaningful is the pair to Hello Hello Hello (050), where different line drawings of “Hello” were shown together. Here, the action underlying those Hellos — tilting, writing, walking — is shown in context.

Previously, the actions and context that shaped the drawing were never made apparent. Now, the line drawing is latent in those same actions and context. Do actions, context, and message need each other to be meaningful? Tilting one’s phone could be a game, moving one’s arm around could be exercise, and the final video could just be documenting the walk to work. Can we discern the message from the action? How does time and the scale of the body impact that legibility?

Each video loops, emphasizing the different time scales of each action: 1 minute 28 seconds, 37 seconds, and 44 minutes 52 seconds.

Technicals

The videos were recorded using an iPhone X and are displayed in the browser using HTML5’s video element. They autoplay and loop when the page loads.

Next Steps
  • Combine this map with Hello Hello Hello (050) to create a diptych, showing both the line drawing

This Is An Umbrella (055)

This Is An Umbrella (055) identifies objects using Yolo, a neural network, and removes them from their context. The images are then situated in a sequence, acquiring new context and meaning.

Sequencing underscores the accuracy of the model, and the discrepancy between human and algorithmic-statistical identification, prompting the inevitable question: is this “person” really a person?
Sequencing also establishes a comparison of similarly-categorized images, where one “person” is evaluated against the next “person”. Does comparison aid or frustrate identifications made by the viewer?
Some of the images were taken from the same frame, some were taken from different frames. As a result, certain elements periodically re-appear, but not consistently. The ticket booth might be identified as a truck, but then might not be identified at all. An umbrella is always identified as an umbrella, but it isn’t always identified.
Finally, objects aren’t fully removed from their context. Instead, they share a bounding box with the immediate surrounding. Within that box, the model is confident that there is a person or a truck, but hasn’t identified the silhouette of that object. Interestingly, the identified region is rectangular, echoing the underlying order of pixels from which objects are identified.
Technicals
The objects are extracted from a series of stills taken from a live video feed of Times Square, discussed in previous posts for Routine Grid (034) and Routine Difference (035).
A previous post discuss the Yolo model. By modifying the source code, the coordinates of each bounding box were saved to a JSON file. The original images are then cropped and looped through using p5.js and the get() function on the original images.
Next Steps
  • Should all images be the same height, despite needing to upsample the pixels?
  • Explore all images of the same class being shown in rows

Color Depth (054)

Color Depth uses a depth image and corresponding RGB data to reconstruct a view from above.

As the pixels are transposed from a frontal view to a view from above, only the front edge of objects are seen, but their relative position to each other is maintained. The objects themselves are difficult discern, seemingly caught between elevational and planimetric perspectives.

Technicals

The depth image was captured using a Kinect’s infrared array to measure distance from the camera. Depth is represented as grey scale values — closest is black, farthest is white. The depth image pixels were read row by row and transposed to the Y dimension. The grey scale pixels were then colored based on the corresponding pixel in the RGB image.

Next Steps
  • Explore extending the color from the front edge of an object until it is interrupted by another object’s front edge, filling in the white space.