In mid April we did a presentation at the 2012 CDISC (Clinical Data Interchange Standards Consortium) Interchange Europe with the title: Semantic models for CDISC based standard and metadata management (see our slides and short paper). This time in a sunny, but chilly, Stockholm at a very nice hotel (Elite Marina Tower). Last year Frederik Malfait, consulting at Roche, and I, working for AstraZeneca, had two different presentations at the 2011 conference in Brusses. See my blog post: Linking Clinical Data Standards.
Since then we have seen more interest in semantic web standards in the CDISC community, see for example the article in Applied Clinical Trials Online (@Clin_Trials): Digital Data, the Semantic Web, and Research, by Wayne Kubick, the new CTO of CDISC. This year Frederik and I did a joint presentation with a key messsage to the CDISC organisation: "Put semantics into the semantics". That is, to start using semantic web standards and linked data principles for the whole suite of CDISC standards. See below our list of proposals.
In my introduction I described the current situation when the question now is "Not when, but how" to best adopt CDISC standards. At the same time the different CDISC standards are not linked and published in different formats and so called metadata registeres (MDR) are requested for robust life cycle management of standards.
Real world use
In my brief introduction (see slide 5-11) to the core semantic web standard, the so called RDF triple, I showed an example of how Google use RDF based standards to improve search (see my previous blog post on schema.org). And I also showed how NCI use RDF to publish the NCI Thesaurus, see RDF/OWL download of NCIt via LexEVS. And also how RDF is used for an early version of the domain model for biomedical research (BRIDG), see RDF/OWL representation of BRIDG/ISO21090. In both these cases the RDF is published as XML, but RDF triples can also be published in different serialisation formats (i.e. XML, JSON, Turtle, and N-Triples). I also showed the latest version of the Linked Open Data cloud, with even more linked datasets than the one Frederik and I had in our presentations last year. I then turned over to the main part of our presentation describing two real world use of how two sponsors now start to use semantic web standards and linked data principles.
Linked Data cloud to grow across AstraZeneca R&D
|Photo from CDISC Facebook|
A semantic web standard based MDR in Roche
|Photo from CDISC Facebook|
Proposals to CDISC
In the slides you can see that Frederik had to transform CDISC standards into RDF using a schema he developed for Roche and give them URI:s in a Roche namespace (e.g. http://gdsr.roche.com/cdisc/sdtmig-3-1-2#Column.AE.AEOUT for one of the data elements). This is not a ideal way, instead we would like CDISC to provide these. Hence the drive from our leadership in Roche and AstraZeneca for Frederik and myself to push back to CDISC.
Below a draft list of proposals to CDISC:
- Decide on a URI design for CDISC standards (e.g. http://id.cdisc.org/sdtm).
- Review the schema Frederik has proposed for the core MDR in CDISC SHARE.
- Publish the new SDTM v1.3 and SDTM IG v.3.1.3 as RDF in XML, JSON, Turtle, and N-Triples formats using the reviewed schema and URI design. (As options to current publication formats, i.e PDF, html, csv, xml/odm.)
- Work together with NCI on enhancing the RDF/OWL version of NCI Thesaurus. Also review the option to use the RDF/SKOS standard and apply linked data principles. Publish coming versions of CDISC CT:s as RDF in XML, JSON, Turtle, and N-Triples.
- Work together with NCI on enhancing the RDF/OWL representation of BRIDG/ISO21090 model and apply linked data principles to make all BRIDG classes, properties and ISO21090 data types linkable.
- Extend the MDR schema for CDISC SHARE for linkage to relevant BRIDG classes and properties and to ISO21090 data types.
- Start exploring semantic web standards and linked data principles also for clinical data, including making invidual clinical data points linkable using URI:s and annotating them using existing and emerging clinical standard terminilogies and ontologies.