Looking for the Same, Part 2 compares two distributions of the same city blocks to ask: is identification more meaningful when the blocks are situated geographically or by similarity?
The map expands on the previous through interaction, where the number of clusters, color coding, and perplexity have been parameterized. But more importantly, distribution is also configurable. City blocks are either geolocated or plotted as a t-SNE visualization.
The t-SNE visualization, color-coding by cluster, helps overcome misleading spatial relationships: even though shapes are related, they may not be side-by-side. The t-SNE color-codes by neighborhood but locates by similarity, while the map-based visualization color codes by similarity and locates by geography. As such, each layout privileges a particular reading or legibility of the same information.
How does the similarity identified by an algorithm compare to our anticipation of similarity? Does it help us see relationships we overlooked or does it reinforce our imagining of similarity?
Next Steps
- Include parameters to change type of clustering algorithm (GMM, affinity propagation, agglomerative clustering, etc.)
- Compare different dimensionality reduction techniques.