Sunday, December 12, 2010

Corporate Transparency and Linked Data

In my previous blog post I described the Open Government Movement and how the Linked Data principles make publicly available data released by the UK and US governments open for citizen utility and economic opportunities.

A recent blog post made me aware that I, and many other, tend to use the term open data to mean publicly available data:
"Simply put, all open data is publicly available. But not all publicly available data is open. Open data does not mean that a government or other entity releases all of its data to the public. ... Rather, open data means that whatever data is released is done so in a specific way to allow the public to access it without having to pay fees or be unfairly restricted in its use."

Source: What “open data” means – and what it doesn't, by Melanie Chernoff, published on the Open Knowledge Foundation Blog
In this blog post I will adopt this when I now focus on publicly available data released by enterprises. And start to look into how linked data principles can be applied for data enterprises make publicly available as part their efforts for Corporate Transparency, and for Social Responsibility.

What will the movement in Governments for  
Linked Open Data mean for Enterprises?
 
How can Corporate Transparency be supported

by applying Linked Data principles? 

Let me first introduce the Linked Data principles and also the 5-star deployment scheme for Linked Open Data. With this in mind I will highlight examples of data made publicly available by two enterprises: Volvo Group and AstraZeneca. And then, outline steps for Linking Open Enterprise Data -- from a 1-star to a 5-star rating.

Linked Data principles
The four principles, or rules, of Linked Data have been outlined by  Tim Berners-Lee, often referred to as the "inventor of the web", in his Design Issues: Linked Data note:
  1. Use URIs (global identifiers) to identify things.
  2. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents.
  3. Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML.
  4. Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web.
Source: Linked Data page Wikipedia
5-star scheme for Linked Open Data
Get a 5* mug - profits help W3C

In my previous blog post I wrote about the 5-star deployment scheme for Linked Open Data presented by Tim Berners-Lee at the International Open Government Data Conference (#iogdc ) in Washington, D.C., and Open Government Data Camp (#ogdcamp) in London. 

What is required for 1-5 star ratings?  What are the costs and benefits? I will elaborate on this for publicly available data released by enterprises based on Linked Open Data star scheme by example, Michel Hausenblas. To spread this nice idea you can buy your own 5-star mug and T-shirt.

Publicly available enterprise data
So, what does this mean for enterprises? Below two example of data made publicly available by two enterprises: Volvo Group and AstraZeneca. Two large international enterprises in  different industries under regulations for different aspects and regions, such as the Corporate Integrity Agreement (CIA) for health services in US.

Volvo Group, Corporate Social Responsibility, publish a yearly Sustainability Report with a Scorecard including key sustainability performance indicators such as Energy consumption (example from the scorecard 2009) Data is formatted as a html table and the whole report as a pdf.
AstraZeneca, Corporate transparency, publish for example data on Physician Engagement, a summary of payments made to U.S. physicians who have spoken on behalf of AstraZeneca and/or its products. Data is published in a table of 2000+ rows as a pdf (Speaker compensation report, January - June 2010).


Linking Open Enterprise Data
one star open Web data


These examples of data are made publicly available in a way that makes it possible for consumers to look at it, print it, store it locally, and to enter it in manually into another system.  If this was done with an open licens (such as PDDL, ODC-by or CC0) they would have got a nice 1-star rating.

For a 2-star rating, data should be made available as structured data (e.g., Excel instead of pdf) so that it also can be reused. Consumer can now directly process it with proprietary software to aggregate it, perform calculations, visualize it, etc.. For a 3-star rating data should be in non-proprietary, open formats (e.g., CSV instead of Excel). Consumer can now manipulate the data in any they like, without being confined by the capabilities of any particular software.

five star open Web data



A key enabler to get 4-star and 5-star ratings is to choose or design a vocabulary of terms for the things (using URIs) the information is about, and for the descriptions about these things so data can be linked. Consumer can now reuse parts of the data with explicit semantics and discover more (related) data while consuming the data.
Source: Linked Open Data star scheme by example and Star badges
Available vocabularies 
An example of such a vocabulary of interest for the AstraZeneca physician engagement example to make it to the 4-star rating is the payments ontology being used for publishing UK government spending data as linked data (see COINS as Linked Data). The ontology (see Guide to the Payments Ontology) has been developed as a general purpose vocabulary for representing organizational spending information and is not specific to government or local government applications.

Of relevance for the Volvo Group example to make it to the 4-star rating is the work in the eGovernment Interest Group for Linked Environment Data that Environment Agencies from Europe and the US are setting up. The Statistical Core Vocabulary (scovo) for representing statistical data on the Web have been used by the German Federal Environment Agency (UBA) to publish linked environment data.

Thoughts for future posts
In future blog posts I will continue the exploration of the opportunities, and challenges, of Linking Open Enterprise Data. I am also interested in experiences of applying Linked Data principles for data  sources available within enterprise networks to make it easier for employees and partners to consume it, and to combine it with other linked data sources -- internal, shared, licensed and publicly available sources.

While writing this post I was thinking of provenance, i.e. open history of data, in relation to the 5-star deployment scheme -- Maybe a 6-star rating for embedding provenance data using emerging provenance vocabularies? I wonder what Tim Berners-Lee thinks about that :-)


Kudos to Michel Hausenblas (@mhausenblas) for the great 5-star scheme examples with costs and benefits, and the nice star badges. And also to Bill Roberts (@billroberts) for excellent input for the payment data example. As well as to Melanie Chernoff (@melaniechernoff) for the interesting blog post on publicly available and open data.