Composing with Metadata in Mind

Volume. Networks. Distance.

by

Synthesis by Jentery Sayers

Click on the map to read more about the synthesis layer.

So how to practice standards in the making (SITM)?  And to what effects on a field such as the digital humanities?  From my reading of Curtis’s, Jamie’s, and Matt’s analyses of the Library of Congress (LoC)’s Flickr project, we all agree that metadata is not merely descriptive.  To compose it is to make a claim.  To tag an object with a date, a place, a number, or a subject is to argue for how the object should be archived and subsequently located within that archive.  Tagging is more than a matter of labeling an object.  It is the shaping of future interpretations.

Unfortunately, this argument is not really provocative, novel, or surprising.  Consequently, it is difficult to say that metadata, as a value-laden practice, is a part of what makes the new work of composing “new.”  However, what is new about the new work of composing is the use of digital technologies to strategically search a large number of documents and mobilize the search results toward new knowledge about literature, culture, history, and society.  Gregory Crane asks, “What do you do with a million books?”[1] Dan Cohen speaks of “the scholarly graph, the networked relations of scholars, publications, and resources.”[2] And Franco Moretti advocates “distant reading,” where “distance . . . is not an obstacle, but a specific form of knowledge.”[3]

Volume.  Networks.  Distance.  Three things that are familiar territory for the digital humanities.  They are also central to a SITM approach to composing with metadata in mind.

Volume

A scholarly project rarely faces the issue of having too few materials to explore.  Rather, the challenge is how to productively refine the project’s focus in order to proceed with a fruitful line of inquiry.  When scholars, either individually or collaboratively, are working with a large volume of documents, then the narrowing process can be incredibly time-consuming and no doubt difficult.  After all, assembling an archive or a canon is a critical practice, and the construction of either is where humanities scholars are expected to excel.  Of course, it is common for others to have already done the constructing for you.  In other words, scholars inherit the primary texts, the criticism, and the concomitant research practices that come before them, and metadata links all of them together. For this reason among others, SITM in the digital humanities de-privileges the individual scholar as the locus of knowledge production and examines networked forms of labor that are collaborative in character.  These collaborative networks almost always necessitate working with machines and software, not to mention other people, often in situations where not everyone involved is cognizant of who is doing what and how.  Behaviors are automated, technologies are rendered invisible, and people become intricately linked with machines. Resonating with N. Katherine Hayles’s explanation of the posthuman, consciousness is no longer the end-all, be-all of human existence in these situations.[4] Also, software and machines process and aggregate a lot of information in order for people to communicate through them. Consider Lev Manovich’s claim that new media are at least partly automated—that people do not have complete control over the machines and software they use.[5] Whatever the origin story here, SITM frames metadata as a mechanism for automation and a filter for collaboration. It influences situational knowledge production without necessarily determining them.

Put this way, metadata helps scholars deal with volume.  It becomes a guide through documents and information.  As such, scholars create explicit metadata standards (e.g., TEI, Dublin Core) to universalize metadata and formalize scholarly practices.  Consequently, archives become interoperable, and the harvesting of information across domains is made much, much easier.  Matt puts it quite succinctly: “Traditionally, metadata is an ‘agreed-upon’ standard of documentation.  I understand it as a distributed practice with a centralized protocol.”   He then pressures how folksonomies, which lack a centralized protocol, might correlate with SITM, and the LoC’s Flickr project is an excellent case study.  For one, it deals with volume.  In approximately two years, the LoC has uploaded over 8,000 images to its Flickr account.[6] Also, as I state elsewhere in this chapter, neither Flickr nor the LoC has an explicit standard for how to tag the LoC’s photographs.

With these two aspects of the project in mind,  a Flickr page such as the one Matt analyzes—”Children asleep on bed during square dance, McIntosh County, Okla. (LOC)”—can be located on Flickr through rather obvious tags, such as “Oklahoma” and “square dancing,” which yield twenty and two images, respectively, in the LoC’s photostream.  These are two tags that correlate with the title of the Flickr page itself and the LoC records listed just beneath the image.  Curiously, a less predictable tag, “barefoot,” actually yields twenty-nine images in the LoC’s photostream.  Of course, “barefoot” is not predictable because it is contained in neither the title of the page nor the LoC’s records.  Even if it is a tag that simply describes what can be seen in the image, it also functions to aggregate a variety of images around an array of cultural formations, including race and ethnicity (e.g., images tagged “Mexican,” “Arab,” and “African American”), geography (e.g., images tagged “Texas,” “Natchitoches,” and “Pie Town”), labor (e.g., images tagged “child labor,” “plantation,” and “dancer”), and gender (e.g., images tagged “woman,” “actress,” and “girl”).  What begins as an apparent matter of fact, namely that a child in an image is barefoot, becomes an organizing principle largely made possible by implicit metadata.  Yet importantly, a principle is not the same as a logic, and this difference should be noted when considering SITM in a digital humanities context.

For one way to practice SITM in the digital humanities is to value the often “illogical” principle of serendipity when dealing with a large volume of texts and information. Tags such as “barefoot” arguably foster a kind of “stumbling upon” new materials and thus spark new research into associations or relationships often overlooked when archives are assembled around more familiar schemas (e.g., alphabetization, chronology, location, and discipline).  These more familiar schemas are logics, or ways of rationalizing how archives are organized, generally through explicit metadata (e.g., Dewey Decimal Classification).  Obviously, they are necessary logics.  For instance, a library section with books filed under “barefoot,” contiguous with, say, a section with books filed under “hairy knuckles,” would likely defamiliarize a library user to such an extent that quickly locating books would be (near) impossible.  Nevertheless, the new work of composing affords searching across documents and mining their content.  The challenge, then, is composing new standards for lines of inquiry that rely upon, and experiment with, principles in implicit metadata.  In the tradition of OuLiPo, such lines of inquiry are more like generative constraints for composition—Perec’s writing without the letter “e,” for example.  These principles are not “rational.”  They don’t make sense or add up.  Still, there is a science to them, and it is a science of navigating through those one million books in non-literal and imaginative ways.

Networks

Related to volume, implicit metadata often enable the formation of networks across geographies, archives, and disciplines, to name only a few.  However, these networks rarely emerge formulaically.  For instance, they may not form around the stated investments of an academic department, the common interests of a journal, or the express theme of an annual convention.  In fact, as the methodology of this very book chapter attests, they are arguably more interesting when they do not.

How, then, are these networks identified? Around what issues and artifacts do they aggregate?  To borrow from David Weinberg, scholars might look for the “shifting social network hidden in the mess.”[7] Or, as Jamie astutely observes in his analysis of metadata annotations on Flickr: “As these annotations are layered on top of one another, competing for attention with each other and the original image, they respond less and less to the photo and instead focus on the emerging conversation that is framed within the same digital space as the image.”  In other words, the mess of Flickr annotations is neither inherent to a particular image nor situated within a specific social group or discipline (e.g., geography or American Studies).  According to some, that mess might be lead to problems: the absence of a method or an agreed upon model.  Nevertheless, over time the mess might very well tell scholars something, and that something is what people are invested in, how they express that interest, around what artifacts, and toward what consensuses and dissensions. For Andrew Pickering, this approach could be nearly synonymous with the “mangle of practice,” which “deflects our attention from any special concern with the particular variables that the disciplines traditionally invite us to focus on and conceptualize in a peculiar way, and directs us instead toward the unitary terrain of practice in a space of indefinite cultural multiplicity.”[8]

Imagined as such, SITM simultaneously pressures existing, formal networks—organized as they are around canonized texts and standardized practices—and unpacks the mess that gathers within and around them, all toward studying and even animating social networks that usually function without appellation.  It is not that they are simply ahistorical networks or uncritical crowds.  It is that they cluster around a project or an idea, which is almost always more temporary than an academic department, discipline, or journal.  Rather than marking their more or less ephemeral character as a meaningless trend or an amateurish whim, the more productive trajectory is to inquire into why this conversation, around this artifact, now?

Distance

Curtis concludes his analysis of the Flickr page for “Negro Boy near Cincinnati, Ohio” with this challenge: “The question that remains (for the new work of composing) is how we might appropriate [digital artifacts] ourselves to construct different narratives . . .”  Here I want to propose that distance is one way of constructing those narratives.  By “distance,” I not only imply “critical distance,” or the knowledge of how to translate personal experience and opinions into experiential learning and complex arguments.  I am also thinking spatially: the distance afforded by a map, a strategic view, or what de Certeau calls the “pleasure of ‘seeing the whole.'”[9] Spatial approaches, much like those demonstrated by some e-journals (e.g., Vectors), data visualization projects (e.g., the UCSD Software Studies Initiative), and neogeography platforms (e.g., “The Virtual University Geoblogging Project”), provide people with a means to construct different narratives with artifacts already at their disposal. It is by no coincidence, then, that Flickr is now invested in the place of digital imagery.  (See the Places project.)   As but one field or category through which to organize a large volume of exhibits, geotagging Flickr images with a “place” becomes an act of encoding possible networks and visualizing them, at a glance, in a single digital space.

For SITM, distance is therefore key to the intersections between networks and volume in three ways.  First—and to return to Moretti for a moment—its allows for the reduction of many texts to named elements.[10] Think metadata for “place,” “time,” “genre,” or “race.”  Named elements are typically identified in the legend of a map or through the requirements of a metadata schema.  Second, distance enables abstraction.[11] The arrangement, synthesis, or juxtaposition of named elements on a map or in a table leads to the formation of networks and correspondences (serendipitous or not) in a fashion similar to the “barefoot” example I mentioned earlier.  Third, the study of those networks ideally reveals the very forces through which networks emerge and remain sustainable: the all-too-often unstudied relationships between multiple events, concepts, groups, cultures, or the like.[12] Much in the vein of humanities computing research by scholars like Willard McCarty, distance as a specific form of knowledge fosters a model for reformulating future interpretations of the past.  This is not distance as detachment.  Or distance as disinterest.  This is distance as a vehicle for re-articulation.

Like both Hamlet and Philip K. Dick, Curtis might call that re-articulation “time out of joint,” since he says that “the actual work of historical knowledge-making . . . involves not simply ‘digging up’ artifacts and placing them accordingly in the container of linear time, but making self-conscious decisions about how that artifact is to be organized alongside others to produce a narrative and an argument about the present.”  Constructing different narratives through digital archives and curations is thus a genealogical affair. Composing with metadata in mind suggests the simultaneous production and study of distance as a form of scholarship.  The trick is remembering that—despite the strategies, the maps, the tables, and all other pleasurable things “meta”—material conditions, physical objects, and embodied subjects are always involved, irreducible to data.


[1] Crane, G. (2006).↑

[2] Cohen, D.  (2008, March).↑

[3] Moretti, F. (2005).  p. 1.↑

[4] Hayles, N. K. (1999).↑

[5] Manovich, L. (2002).↑

[6] Gavin, J. (2010).↑

[7] Weinberger, D. (2007). p. 176.↑

[8] Pickering, A. (1995). p. 216.↑

[9] Certeau, M. d. (1984). p. 92.↑

[10] Moretti, F. (2005). p. 53.↑

[11] Ibid.↑

[12] Ibid pp. 63-64.↑