Leafnode | watching the web garden grow


The internal representation of most computer data consists of a series of individual records. Sometimes the records are distinct, but often they will relate to one another (e.g., a page entry containing three paragraph entries). HTML, despite typically being written by humans, shows its machine heritage through its own patterns of nested tags-within-tags.

In these animations I examined the HTML markup of the google search page from archived copies over the period of March 2000–January 2006.

Run Animation

A Page is a Tree

An HTML file is a text file like any other. But in addition to the textual ‘content’ that’s bound for the screen, it is peppered with tags that label and define the attributes of the snippets of text that are actually meant to be read.

These tags are nested one inside the other such that any given tag may have a number of ‘children’ living below it. Much like the family tree structure from which this parent & child relationship draws its name, tags too can be represented as trees of descent from an original parent.

Run Animation | View Quicktime

Web Botany

Thanks to archive.org, there are numerous snapshots of the google frontpage since it first appeared in 1998. Visually, the page is known for its sparseness and asceticism. Viewing the HTML allows us to see that the apparent simplicity was masking an extensive structural edifice supporting those few visual elements and putting them in precisely the right place.

A shift has occurred in the last decade away from HTML layouts that (mis)used the <table> tag to divide up the page and position elements within its grid. In an effort to separate form and content, the new philosophy places layout and typography choices in a separate .css file.

This animation steps through a sampling of the days in the archive displaying each day's HTML as a tree. Each document grows from its root node – the <body> tag, always drawn in red – and adds its children in a depth-first walk of the tree. Tags used as part of a <table>-based layout are colored blue. The balance of these invisible structural tags in blue against the greens of the more content-oriented tags suggests the degree to which google has separated presentation from content.

Run Animation | View Quicktime

Calling Over Time

There is quite a bit of variety in the tree shapes generated from these few dozen snapshots. But given that these are all snapshots of the same page, these trees are less a collection of individuals than they are members of a lineage. Much like the other main class of human-prepared/computer-digested text – programming source code – HTML documents are rarely rewritten from scratch. Instead, additions are nestled into the document next to preexisting elements. When rewrites occur they are usually limited to a subregion of the document, leaving other branches in the tree undisturbed.

In this animation the days are played sequentially. Each day the new tree is compared to the old tree to find new nodes which did not exist the previous day, and rewritten nodes which have been removed. Days are marked by changes in the background color. Again, the <body> node is drawn in red, as it is the one node guaranteed to survive any revision. The size and color of the other nodes are driven by their age. Nodes start small and green but gradually grow larger and bloom into a squash color. Thus the ages of different branches can be compared by their relative fluffiness.

Run Animation | View Quicktime

What We Call Progress

The series of revisions to a page also means a series of discards as old versions silently vanish in favor of the new. Each new page vaults off of the scaffolding of the preexisting HTML, is viewable for a brief moment, and is finally replaced when a new service is unveiled or a copyright notice is reworded.

This process is played out here by again running through the snapshots of the google HTML, drawing the days sequentially but individually. The root of the ‘current’ tree is drawn in red. All the other nodes will grow as they age, but once they get too old they will disappear and the tree will fade away. At most times, three generations are visible displaying their different structures and distributions of hyperlinks (drawn in blue).

Many thanks to the tenniscoats, from whose song Marline the audio loop for this animation was pulled.