Linked Data for Enterprises: August 2011

Wednesday, August 31, 2011

Ideas on Linked Open Transportation Data for TravelHack

Earlier this summer I saw some tweets about a nice event here in Gothenburg: West Coast TravelHack 2011, 8-9 October. As I am a daily commuter (with Västtrafik's trams, buses and trains) and an information architect addicted to the linked data idea, and I also have a background as researcher in mobile informatics, I got two ideas and wrote them up as tweets (tweet 1 and tweet 2)

Today, I saw some tweets linking to two articles about the interesting FixMyTransportation:

mySociety launches FixMyTransport.com, Open Knowledge Foundation Blog
How to create sustainable open data projects with purpose, O'Reilly Radar

Looking for hackers

I was reminded of my two ideas and also of my time as a part-time industrial PhD researcher. My research in the Mobile Informatics group, at the Victoria Institue and IT University, concerned the mechanisms needed to provide highly mobile professionals, such as new journalists, with contextualized information using mobile applications: "Mobile Newsmaking" (thesis, presentation)

So, I posted a tweet about FixMyTransportation it in Swedish and Karl-Petter Åkesson (@kallep), an old friend from my time as part-time researcher, kindly replied and said in his tweets back (tweet 1 and tweet 2): Why not get together with a couple of hackers and show how your ideas for a linked data infrastructure could enable nice apps and services for commuters. Great, I tweeted back -- but, I don't know that many great hackers as it's ten years since I did my research on mobile applications.

So, now I am looking for some great hackers to potentially explore my ideas on Linked Open Transportation Data at the TravelHack event 8-9 October.

Give every bus stop, tram route and train station etc. a URI

Identify "things" globally by using http based URIs (Uniform Resource Identifiers) - today all public schools, roads, ministers, and many bus stops, in UK have URIs.

For example the URI http://transport.data.gov.uk/id/stop-point/1800SJH1081 identifies a bus stop in Manchester. Assigning a http based URI is what the two first principles of Linked Data say.

The third principle say that you should provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML. So, if you you put this URI http://transport.data.gov.uk/id/stop-point/1800SJH1081 in a web browser it will give you a nice html documentation of the metadata describing the busstop. A app or service could choose between for example a RDF/XML file or a JSON file. See my Linked Data page for some nice videos, books, blogs etc.

Use a common vocabulary for transportation

And all these "things" can also be typed, described and linked using classes, properties and relationships from a range of vocabularies for different domains.

For the transportation domain I have seen some nice tweets pointing me to TRANSIT: A vocabulary for describing transit systems and routes.

There are also many general vocabularies and ontologies that are commonly used to publish linked data. You can 'cherry-pick' from some of the most common, for example Friend-of-a-Friend (FOAF) provides terms for describing people and their social network, SIOC Semantically-Interlinked Online Communities, and Dublin Core defines general metadata attributes.

Kudos to @peterkz_swe, @egonwillighagen, @wieselgren, @kallep
for nice interactions on Twitter inspiring me to write this blog post

Thursday, August 18, 2011

A prediction 3-5 years from now

Making predictions can be tricky. However, a former colleague, and actually also my manager for a short time, Jean-Peter Fendrich (@carokanns) recently published a few predictions 3-5 years from now in the LinkedIn group Volvo IT Innovation Centre

Inspired by iPad and its competitors there will come a new device that replaces the Laptop as we know it now.
Html5 will make all these app's and app technologies obsolete.
We will finally have standards and infrastructure that support "mobile wallet" - replacing cash, credit cards and other payment systems.

JP asked for feedback and more predictions, so I posted the following:

Globally identified "things" using http based URIs (Uniform Resource Identifiers) - today all public schools, roads, ministers, and many bus stops, in UK have URIs.
And all these "things" will also be typed, described and linked using classes, properties and relationships from a range of vocabularies/ontologies for different domains, see for example TRANSIT: A vocabulary for describing transit systems and routes.

(I also referred to a recent report from Booz&co with the title: Designing the Transcendent Web: The Power of Web 3.0. )

As JP and I, together with Martin Börjesson (@futuramb), Annika Eriksson, Christian Forsäng and Else-Marie (Emma) Malmek, were some of the folks introducing the first generation of web technology (Web 1.0) in the Volvo organisation back in the mid 90ies it was nice to highlight the third generation (Web 3.0) in this Volvo IT group.

The focus in my blog postings and tweets the last year or so has been on two of the fundaments for Web 3.0, i.e. the Linked Data principles and in particular the use of http based URIs. For more details, see one of my first blog posts: Corporate Transparency and Linked Data. See also my list on URI Design that I try to keep updated.

"Data is the new electricity. URIs are the conduction mechanism."
Quote by Kingsley Uyi Idehen (@kidehen)

Tuesday, August 9, 2011

ICBO2011 Reports

The last week in July I and three colleagues attended the International Conference on Biomedical Ontology (ICBO) 2011, in Buffalo, NY. As I have been a "remote hang-around" on Twitter following other conferences on distance (see for example my blog post following the SemTech conference earlier this summer) it was great fun this time to be active on Twitter IRL in Buffalo: My #ICBO2011 tweets

And yes, I did see the Niagara Falls again -- this time I did get really close to them on a boat tour with the "Maid of the Mist".

Now, after a long journey home, and a couple of relaxing days on the Swedish west coast and in central London, it's time to use my tweets, the conference presentations and proceedings (pdf) to pull together some of my insights and learnings. Here's my first report with some notes and reflections from the conference and follow up to my previous blog posts in preparation for the conference (part 1 and part 2). See also my fourth blog post from ICBO published 1 September.

High quality, "true", ontologies
It was nice to see presentations and read papers on ontologies from a broad spectrum of domains, such as:

Genes
See a recent paper: How the Gene Ontology Evolves, describing the ways in which curators of the Gene Ontology (GO) have incorporated new knowledge.
Protein complex and supra-complex
See the presentation on this topic in the panel the first day: From proteins to diseases, by Bill Crosby (Department of Biological Sciences, University of Windsor)
Emotions and Chronic pain
See the presentation and paper on how to represent emotions based on research in affective disorders such as bipolar, depression and schizoaffective disorder, by Janna Hastings, (European Bioinformatics Institute, UK, and, Swiss Centre for Affective Sciences, University of Geneva, Switzerland). See also the announcement of the development of an ontology for Chronic pain and a nice video: Toward a New Vocabulary of Pain.
Demographics
See the presentation describing how "demographic data in current information systems is ad hoc, and current standards are insufficient to support accurate capture and exchange of demographic data", and the proposed use of the Demographics Application Ontology to as a solution.
Adverse Events
In the workshop on representing adverse events we learned about interesting work on adverse ontologies. (See a video of the workshop organizer Mélanie Courtot: Towards an Adverse Event Reporting Ontology). We also learned about the development of ontologies to represent temporal relationships (e.g. Clinical Narrative Temporal Relation Ontology) which is a key aspect in handling safety issues and regular ongoing pharmacovigilance in pharmaceutical research and development.

All of these are examples of high quality "true"1) and modular ontologies developed beneath the Basic Formal Ontology (BFO) providing formal definitions for types of entities in reality and for the relationships between such entities (so called ontological realism). Such ontologies are designed to allow annotations of experimental and clinical data "to be unified through disambiguation of the terms employed in a way that allows complex statistical and other analyses to be performed which lead to the computational discovery of novel insights"2).

My own reflections:
So far we have seen none, or very little, uptake of such high quality "true" ontologies for clinical data. Something I also highlighted in my earlier blog post on clinical data standards. In a coming blog post I will present a demo using the Demographics Application Ontology showing how a high quality "true" ontology can be used to support accurate capture and exchange of demographic data. I will also outline some ideas on how this could be used also for clinical study data (CRF:s and databases).

"Mapping mania" for the legacy of terminologies

A common theme in several of the presentations, papers and panels was the mappings (matching, alignment) needed between terms and concepts organized as terminologies and coding nomenclatures, such as SNOMED CT, LOINC, ICD, CDISC SDTM CT:s (derived from NCI Thesaurus), and MedDRA. Here are some examples:

Extraction of the anatomy value set from SNOMED CT to be reused for the 11th revision of the International Classification of Diseases (ICD-11). See a presentation on the problems and proposed patterns by some well known people (Harold Solbrig and Christopher Chute at Mayo Clinic, Kent Spackman working for IHTSDO, and Alan L. Rector at University of Manchester)
The Ontology Evaluation Alignment Initiative (OAEI) was mentioned by several presenters as a forum to discuss the problems of direct matching between different terminological resources.
The use of a ontology matching tool called AgreementMaker was presented.
In a panel on: National Center for Biomedical Ontology (NCBO) Technology in Support of Clinical and Translational Science, the basic lexical term mappings was mentioned as an example of a service available both via BioPortal's graphical interface and as REST services.

These are all example of a legacy already in use, or in the process of being used, for the annotations of EHR, clinical trials and patient safety data. For example for the huge US initiative on meaningful use of EHR as highlighted by Roberto Roch in his keynote on Practical Applications of Ontologies in Clinical Systems.

My own reflections:
In my previous blog post preparing for the conference I refereed to the mapping problem as comparing "Apples and Oranges" and sometimes I think of it as a "mapping mania". In the conference I did hear the comment "Mappings are hard" several times, and also the question "Who will create, validate and maintain all the mappings?"

After some more days of vacation I will get back later on in August with more notes and reflections from the conference:.

I will report from the debate on how to accurately connect data from measurements and questionnaires (information entities) to ontologies (real world entities). I think this is a key aspect to get machine-processable clinical data ready for automatic transformation and direct querying, and ready for inferencing and reasoning.
Another theme I would like to cover is referent tracking, i.e. assign globally unique identifiers for each entity in reality about which information is stored. For example diagnoses, procedures, demographics, encounters, hypersensitivity, and observations as they are reported in EHRs. This is something I think is a key enabler for accurate secondary use of EHRs.

1) See More than Words: Biomedical Ontologies

2) See Dispositions and Processes in the Emotion Ontology

Pages