Making ‘Making Legible’ Legible: Part 3

Since processing the text documents, I’ve been refining the goal of “finding latent (content and contextual) relationships within a large corpus of texts”. As the text remains a work in progress, I want to focus on how it has evolved and continues to evolve. A genealogical approach to text-relationships can be used to identify what pieces have been disregarded or ignored (and thus require further inspection) or identify the dominant tendancies and trains of thought.

An interesting writing tool for collaboration and version control: http://docs.withdraft.com

Beyond looking at the past, I think this project can provide a foundation for developing a writing tool that moves beyond version control or collaborative commenting. Version control tends to provide a fine-grain binary approach: it compares two things and extracts the insertions or deletions. While this is helpful in an isolated scenario, I’m interested in broader developments across multiple objects over many time periods. Alternatively, version control also provides a high-level view indicating change-points over a long time, but those points of change are overly simplified – often represented by just a single dot. Without context or without knowing what specific time a change was made, this larger overview provides little information beyond the quantity and frequence of changes. Through a geneological and contextual approach to analyzing an existing body of text, I’m hoping to identify what sort of relationships could inform the writing and editing process.

With all the data now added to the database, I’ve been exploring sentence similarity. The diagram below shows the process I’ve gone through up to this point.

Once I’ve computed a two-dimensional array mapping the similarity of all sentences to each other, I plan on using that information to create visual interface for explore those relationships. The wireframes below are a rough sketch of what form this might take.

More Controllers for Pong

The lastest controller for Pong uses a cardboard plane attached to a potentiometer to control both speed and direction of the virtual paddle. The rotation of the potentiometer divided in half to control direction, and then within that extent, speed is modulated.

Process

Because the ESP Chip is somewhat expensive (relatively), I invested in protyping my circuit on breadboards and milled boards on to which the chip could plug in temporarily.

Breadboard Prototype
Milled prototype with header pins for the ESP chip

In order to prototype directly with the chip rather than a breakout board, I needed a programming jig for connecting via USB and closing certain routes on particular pins. When programming…

  • GPIO0 needs to be connected to GND. A button was held while uploading code.
  • Reset need to be flashed. A button was pressed initially before programming.
  • Tx and Rx connections between the ESP chip and FTDI cable were accomodated with header pins
Completed board
Schematic

The Board

The ESP chip draws a significant amount of power; however, conflicting advice online made it difficult to size capacitors. Although, I found these tips to be most helpful. While they recommend a very large capacitor (470 uF) across the Vcc to Gnd, the Adafruit breakout only used 10 uF. While I included two 470 uF capacitors, my next iteration would explore smaller size. However, a 0.1 uF decoupling capacitor across the ESP8266 Vcc to Gnd inputs very close to the pins was a critical addition.

Making ‘Making Legible’ Legible: Part 2

The structure of data has profound consequences for the design of algorithms.
– “Beyond the Mirror World: Privacy and the Representational Practices of Computing”, Philip E. Agre

To atomize the entire corpus of text, the server processes each document upload to create derivative objects: an upload object, a document object, sentence objects, and word objects. By disassociating the constituent parts of the document, they can then be analyzed and form relationships outside that original container. I’ll discuss those methods of analysis in a later blog post. The focus of this post is how the text is atomized and stored because as Agre pointed out, the organization of data fundamentally underpins the possibility of subsequent analysis.

The individual objects are constructed through a series of callback functions which assign properties. These functions alternate between creating an object with its individual or inherited properties (i.e. initializing a document object with a unique ID, shared timestamp, and content string) and updating said object with the relational properties (i.e. an array of the IDs of all words contained within a document). By necessity, some of these properties can only be added once other objects are processed. The spreadsheet below shows the list of properties and how they are derived.

Properties for each object type

Additionally, as discussed in the previous post, the question of adjacency (or context) is a significant relationship. After the words or sentences are initialized with their unique IDs, the callback function then reiterates over them to add a property for the ID of the adjacent object.

At the sentence level, because the original documents were written in markdown, special characters had to be identified, stored as properties and then stripped from the string. While the “meaning” and usage of these characters is not consistent over time or across documents, they can later be used to identify and extract chunks from document.

Below is an example excerpt of a processed output, from which the individual objects are added to the database. The full code for processing the document upload can be found here.

On Clay Shirky’s ‘Here Comes Everybody’

Some thoughts:

Through the lens of social media, Shirky illustrates McLuhan’s initial proposition that the message of a given technology is the resulting change in human relationships over space and time (the psychic and social consequences). However, he points to ‘professional narcissism’ as the reason for newspapers’ obliviousness to the effect of social media. I’d argue that it’s not a question of professional bias as to why traditional publishing misjudged the role of social media and amateur publishing, but rather an inability for anyone to foresee something that does not yet exist. Hindsight is valuable precisely because we can look at a time and space we are no longer in. Sidenote: he omits architects as professionals, but I’d agree they are most representative of this quotation: “[a professional] pays as much or more attention to the judgment of her peers as to the judgment of her cus­tomers when figuring out how to do her job.” (Bolding mine, recovering from architecture for ever.)

I also found the briefly touched on question of physicality to interesting. Shirky writes, “Digital means of distributing words and images have robbed newspapers of the coherence they formerly had, revealing the physical object of the newspaper as a merely provisional solution; now every article is its own section.” Thinking back again to McLuhan who argued that the typographic cultural bias equated linearity with rationality, perhaps this also is reflect in the newspaper’s inability to forecast the impact of social media.