Sunday, January 8, 2012

"To-day we have naming of parts"

I like to explore associations people share with me when I describe something new to them. So, I was pleased the other day when a colleague in the UK shared the phrase that did stick in her head when I described the idea of URI:s (Uniform Resource Identifiers) and Linked Data: "To-day we have naming of parts".

So, after some googling I found the poem by Henry Reed (1914-1986), "Naming of Parts." New Statesman and Nation 24, no. 598 (8 August 1942).

To-day we have naming of parts. Yesterday,
We had daily cleaning. And to-morrow morning,
We shall have what to do after firing. But to-day,
To-day we have naming of parts. Japonica
Glistens like coral in all of the neighboring gardens,
     And to-day we have naming of parts.

Hear Henry Reed and Frank Duncan read "Naming of parts" (mp3)


Through a very nice website; The Poetry of Henry Reed, I learned more about this World War II British poet, critic, translator, and radio dramatist. It helped me to better understand this wonderful, and sad, poem about the contrast between the world of weapons and the world of nature. 

Naming parts and other things
I also learned about an article (DOI:10.1038/nbt0102-27) in Nature BioTechnology (2002) using the first stanza in Henry Reeds' famous poem as its title. In the article a professor of genomics at the University of Manchester describes the identification of previously non-annotated genes in yeast.

And, I also found a blog post from 2009 that also used the first stanza in Henry Reed's poem in its title:
Naming of parts and other things. That is, David Bawken's (@David_Bawden) post on his nice blog: "The Occasional Informationist, irregular thoughts on the information sciences". In this post he describes a meeting with John Wilbanks (@wilbanks) at the British Library:
In his presentation of the need for annotation of digital reporting of scientific findings, Wilbanks commented simply that we need to call the same thing by the same name; this makes possible the semantic linking of information and data, the creation of ontologies, and so on, without which it will not be possible to share information across disciplinary and sub-disciplinary silos. 
He exemplified this by examples by simple – the various names for coffee in different languages – and complex – the variant terminology used in hundreds of datasets relating to polar climate change, and in over a thousand related to genomics.
There was another aspect to this point. What we call an information object in the digital world – DOIs and all the rest – is also fundamental; if we do not call these digital objects the same thing, we will have great difficulty in finding them.

Names of today
So, let me conclude this post with a couple of examples of naming parts and other things using names of today that is http-based URI:s. The three example URI:s are also three examples of large efforts to publishing linked data "about the named things":

  1. British Library's URI for the poet Henry Reed
    http://bnb.data.bl.uk/id/person/ReedHenry1914-1986
  2. Wikipedia's, i.e. DBpedia's, URI for the poet Henry Reed
    http://dbpedia.org/resource/Henry_Reed_%28poet%29
  3. The DOI for the the article about identifying genes in yeast turned into a URI by CrossRef
    http://dx.doi.org/10.1038/nbt0102-27

1. British Library publish metadata about bibliographic resources ("things") using Linked Data techniques and technologies. And part of that is to assign http-based URI:s to the creators. For a great introduction to the underlying model see the blog post: British Library Data Model: Overview by Tim Hodson (@timhodson).

So, for example the data model specifies that persons who are the identified creators of bibliographic resources, such as the poet Henry Reed (http://bnb.data.bl.uk/id/person/ReedHenry1914-1986), should be of the type Agent and Person according to the basic, and very often used vocabulary for linked data, called Friend of a Friend (FOAF).


2. A large part of the structured content published on Wikipedia pages is also made available as linked data called DBpedia. See this great article: How DBpedia Treats Wikipedia as a Database. The so called resources ("things") that the wikipedia pages describes are in DBpedia given http-based URI:s and each resource are typified using a thin model called the DBpedia ontology. 

So, here we can see that the poet Henry Reed is also identified in DBpedia (http://dbpedia.org/resource/Henry_Reed_%28poet%29) and described with the structured data from the Wikipedia page about him. Such as his birth date and death date, and also the fact that he is categorized using the concept 'English poets'. This concept also has a URI http://dbpedia.org/resource/Category:English_poets. So, we may have more than one URI for the same Henry Reed. These can be related to each other using the sameAs statement.


This is not yet done by the British Library, but I assume this will be done later as for example the Swedish Library catalogue relates their URI:s to DBpedia's.

Here is another URI, http://dbpedia.org/resource/Category:Firearm_components, for a categorization concept, and in the DBpedia interface you can see of list such resources ("things") and links to them using URI:s such as http://dbpedia.org/resource/Sling_%28firearms%29.


3. CrossRef has made metadata for 46 million Digital Object Identifiers (DOI) available as Linked Data. DOIs are used for publishing of uniquely identify electronic documents (largely scholarly journal articles). CrossRef is a consortium of roughly 3,000 publishers, and is a big player in the academic publishing marketplace.

So, here is the identifier of the article about identifying genes in yeast http://dx.doi.org/10.1038/nbt0102-27.

Kudos to my colleague for the opportunity for me to learn more this wonderful poem and for a great discussion.
To ReedingLessons the signature behind the great website about Henry Reed.
To @David_Bawden for his niceblog 
The Occasional Informationist.
And, finally, to @wilbanks a great source of inspiration.