Sunday, May 22, 2016

Awesome graphic as Graphs

The classic continuum from Data via Information to Knowledge is nicely visualized in a three part graphic. I've seen it shared many times the last couple of years on Twitter and LinkedIn. Today I saw it extended with Insight and Wisdom. It made it even more awesome.

Original graphic by Hugh MacLeod @hughcards
extended by David Sommerville @smrvl  

It was my friend and former colleague Martin Börjesson @futuramb that did a Re-Tweet of a tweet from John Hagel @jhagel, management consultant and author. It took me to the creator of the original graphic, Hugh MacLeod @hughcards, cartoonist and co-founder of @gapingvoid. The extension of it is done by David Sommerville @smrvl Digital Design Director for @TheAtlantic.

So, I started to think about representing the five pieces as executable and querayable graphs:

  • 1 DataPoint class
  • 21 DataPoints
  • 2 InfoClasses (represented by the green and lilac labels) 
  • 21 Classifications 
  • 1 type of Relationship
  • 18 relationsships
  • 1 new InfoClass (yellow) 
  • 2 new Classifications
  • 1 Relationship Query

RDF triples, RDF Schema and SPARQL would be one option.

Neo4j Property Graph and Cypher, another option.

Well, will see if I can find the time to do it, or convince some graphs and linked data friends to have a go at it :-)



Thursday, May 19, 2016

Global, persistent and resolvable identifiers for clinical data

Yesterday two thought leaders in clinical data standards publised great blog posts. Dave Ibersen-Hurst (@Assero_UK) and  Armando Oliva (@nomini). Dave's post has the title Wear Sunscreen but it's really about "CDISC 2.0". Armando's post has the title Improving the Study Data Tabulation Model

Discussions threads on Twitter and LinkedIn today made me write this post about one the many great proposals in the two blog posts: 1. SDTM should incorporate unique identifiers for each record in each domain.

In today's clinical data standards for 2-dimensional/tabular data exchange, e.g. CDISC SDTM, keys are either natural keys, e.g. STUDYID, USUBJID, LBTESTCD in a dataset of labdata according to SDTM, or surrogat keys, e.g LBSEQ. A define.xml file should be the source for study specific Key Variables for each dataset. For more details about SDTM keys and the challenges of this see Duplicate records - it may be a good time to contact your data management team, PharmaSUG 2016, Sergiy Sirichenko and Max Kanevsky (@pinnacle_21)

Armando details the proposal in his blog post as he says that the identifiers should be "globally unique".
This is a discussion I have looked forward to since I urged CDISC to consider semantic web standards and linked data principles in my presentation at CDISC EU conference in 2011.

Linking Clinical Data Standards
My presentation at CDISC EU Interchange 2011
I now see how smart programmers and informatians use checksums as record identifiers as a practical way to get around this problem and simplify the integration and reviewing of clinical data.

A phrase we often use talking about linking data and semantic web standards is: "globally, persistent and resolvable identifiers".

  • A http URI schema makes identifiers possible to resolve. An example of the URI that has a resolver service is http://data.ordnancesurvey.co.uk/id/postcodeunit/SO160AS the URI for the UK postcode SO160AS 1). 
  • While the URIs assigned to CDISC standard items such as http://rdf.cdisc.org/std/sdtmig-3-1-3#Column.LB.LBSTRES for the standard lab result variable in CDISC SDTM do (yet) not resolve.

So how would a URI look like for a single data point in a clinical study? HL7 FHIR use so called UUID. Trusty URI:s use hash values "URIs that contain a certain kind of hash value that can be used to verify the respective resource" http://trustyuri.net/ 

I am eager to learn more about the potential of using URIs in combinations with Blockchains. This presentation on using blockchain technology and semantic standards for provenance across the supply chain made me think ...



... about Semantic blockchains in the Clinical Data Supply Chain. With identifiers assigned to each data point through the the supply chain of clinical data captured in EHR and smartphones, fed into clinical trial records, aggregated into summary level TLFs and later on included in secondary use analyses.

Thoughts?

1) https://www.ordnancesurvey.co.uk/education-research/research/linked-data-web.html 
2) CDISC2RDF see https://github.com/phuse-org/rdf.cdisc.org

Friday, May 6, 2016

Twitter Feeds and Blog posts from Conferences

Conferences is a great way to meet interesting people and learn new things. Always nicest when you can attend IRL but interesting also following remotely via Twitter feeds, live blogging and reports and presentations blog post.

Conference Live Blogging

When I can attend conferences IRL I like to take notes using Twitter and I try to gather links and tweets using Storify as a kind of live blogging. Check out Storify/kerfors from events such as the recent Linked Data in Sweden, 2016 (ldsv2016) and HL7 FHIR workshops at Vitails, eHealth conference (Vitails2016).

Me in action live blogging

When I can not attend I like to follow conferences on  a distance and read peoples blog reports.

This week I've been following the great #csvconf feed from "a data conference that's not literally about CSV file format but rather what CSV represents to our community: data interoperability, hackability, simplicity,etc" The most interesting Twitter feeds from onferences I've seen so far.
Many thanks to some of the people tweeting from the event: , @_inunddata, @EmilyGarfield (Emily also posted some very nice drawings from the event.)


Conference Reports as blog posts

The recent CDISC Europe conference in Vienna #CDISCEurope did have a pretty thin feed but with some great tweets from Magnus Wallberg (@CMWallberg), Technology Evangelist at WHO Uppsala Monitoring Center, posted a few tweets.
Magnus also wrote an excellent report as a blog post: A great mix of standards and great visions when CDISC met in Vienna

Update: Just after I published this blog post I saw Wayne Kubick's (@WayneKubick), CTO for  HL7 and former CTO for CDISC, blog post HL7’s FHIR and BioPharma and article in Applied Clinical Trial: Building on FHIR for Pharmaceutical Research from a HL7 event I recently followed: Partners in Interoperability workshop in Washington DC.

Conference Presentations accompanying blog posts 

I also very much like when presenters quickly post their conference presentations on e.g. Slideshare. And it's also very nice to see accompanying blog posts with the speakers notes and additional material. I very much liked Dave Iberson-Hurst (@assero_UK) blog post with his CDISC Europe presentation this year. It is a post on his Semantic Web & Metadata series: CDISC Standards: Assessing the Impact of Change

I tried something similar when I wrote a blog post to prepare for my presentation "Linked Data efforts for data standards in biopharma and healthcare" at the Linked Data in Sweden, 2016 meeting a week ago: Linked Data in Sweden 2016