Friday, June 13, 2014

openFDA a Game Changer?

I’ve been fascinated by innovative people in the FDA organization since I had the pleasure to meet Dr Norman Stockbridge, the father of FDA’s Janus datawarehouse model, F2F back in 2005 in Washington, DC. 

So when I saw some early notes about an openFDA initiative in June 2013 and early 2014 I posted a couple of tweets.

In April I wrote a short blog post about openFDA. And, when I saw how the new Chief Health Informatics Officer at FDA, Taha Kass-Hout (@DrTaha_FDA) started to count down on Twitter a couple of weeks ago I got really excited. It was nice to follow the #hdpalooz feed on Twitter from the health care data event in early June when openFDA was launched.

And, also to see services that directly were picking up the first openFDA API and launced services and apps to search the 3.4 million adverse events, such as Research AE

For a brilliant intro to what sits behind the first openFDA API I recommend Alex Howard's (@digiphile) excellent article: openFDA launches open data platform for consumer protection openFDA launches open data platform for consumer protection.
"Instead of contracting with a huge systems integrator.. FDA worked with a tiny data science startup.. to harmonize the data, create a cutting-edge website, and write and release open source code for a data publishing platform for it [on GitHub]"

I think this will be a game changer for how we think about open data, open source and open communities in industry. And yes, I do think we will soon will see much more Open, and Linked Data from FDA, and hopefully also from EMA and across industry.

Kudos to the devlopers behind all of this great work,
e.g. Sean Herron (@seanherron) and  Brian Norris (@Geek_Nurse)

Wednesday, April 2, 2014


It's exiciting to see how the FDA (Food and Drug Administration) now starts to make some nice buzz about their new project called openFDA:  A research project to provide open APIs, raw data downloads, documentation and examples, and a developer community for an important collection of FDA public datasets.

Excellent blog post from Dr. Taha Kass-Hout (@DrTaha_FDA), Chief Health Informatics Officer of FDA. He writes: "Our initial pilot project will cover a number of datasets from various areas within FDA, defined into three broad focus areas: Adverse Events, Product Recalls, and Product Labeling."

Introducing oepnFDA
I do hope that the idea of not only open, but also linked data, will be part of this effort. For a quick intro to Why Linked Data? check out this nice video explaining the utility of linked data and how its being used by the UK's Ordnance Survey.

I don't have the full context to all of this, but I may think there are some excellent opportunties for Dr Kass-Hout and his team to leverage linked data intitative such as these:

Thursday, February 27, 2014

Why I am so obsessed with this Semantic Web thing

In an earlier blog post I reflected on the fact that it is now 25 years since the web was born. I had the opportunity to bring web technology into a large organisation. Many colleagues asked Why are you so obsessed by this "Web thing"? (remember that this was the time when a Swedish minister said that "Internet är bara en fluga").

So, now in 2014 many ask me Why are you so obsessed with this "Semantic Web thing"?.

I had a good chance to reflect on this question when I was asked to be one of the keynote speaker at a very nice conference: SWAT4LS, Semantic Web Applications and Tools for Life Science, in Edinburgh. 

I was also interviewed together with other speakers by the eCancer organisation in relation  to the EURECA (Enabling information re-Use by linking clinical REsearch and Care) project, Always scary to see, and hear yourself, but I think I managed to convey some of my thoughts. And it is really nice to watch the interviews with Frank van Harmelen,Eric Prud'hommeaux, Robert Stevens and David Kerr.

However, I think the one that best expressed the answer to the question was Charlie Mead. Charlie has been around in a long time in the standard world, working with HL7 for health care data and CDISC for clinical research data. Charlie is now a co-chair of the W3C interest group for semantic web for health care and life sciences (HCLS). I recommend this 7 minutes interview with Charlie. Below I have transcribed the last part of it as I think Charlie well express the reasons for Why I'm so obsessed by this "Semantic Web thing".

Charlie Mead
W3C HCLS semantic web interest group
"The thing that is really astonishing about the semantic web, the tools and technologies, really solve all of the core problems that we struggled with for a very long time. 
And they solve them in a very elegant way, which almost by magic, that live on top of the Internet that we now works and have brought tremendous value. 
And I think the real barrier to adopt these technologies is that is if more people understood what they can do I think the change curve will come faster and the resistance would melt more quickly."

Kudos to Scott Marshall, W3C and EURECA project, (@mscottmarshall)
for arranging the interviews and to the eCancer TV team.

Thursday, February 6, 2014

Why are you so obsessed with this Semantic Web thing

A lot of nice buzz today in sociala media when Tim Berners-Lee discusses the future of the web in the March issue of Wired UK. The web turns 25 years in March.

It reminded me of what collegues asked me almost 20 years ago: Why are you so obsessed with this "Web thing"??

Thanks to some great people in the Volvo business and data organisations I was exposed to "this web thing" and it made me change direction in my professional carrier. From a fancy job as Account Mananger to leading a small network of people get the Volvo Web Wave moving.

Today, 2014, my collegues ask me: Why are you so obsessed with this "Semantic Web thing"?

Recently I, together with other speakers at the SWAT4LS (Application and Tools in Semantic Web for Health Care and Life Sciences) conference, had the opportunity to reflect on the main difference the semantic web can make for patients, health care and clinical research professionals in video interviews by for the EURECA project. Stay tuned for these via my Twitter (@kerfors) feed and in a coming blog post.

Sunday, November 17, 2013

De-identification and Informed Consent in Clinical Trials

Thursday evening I was following the great #PACCR feed on Twitter from a "Patients at Center of Clinical Research" discussion hosted by Eli Lilly Clinical Open Innovation team. (Thank you Rahlyn Gossen, @RebarInter, for the pointer)

A couple of interesting comments came up in some tweets on the topic of de-identification. As de-identification (sometimes called anonymization) is a key topic for clinical trial data transparency, I did find these quotes really interesting.
It was said in the meeting by Regina Holliday (@ReginaHolliday), a great tweeter promoting patients rights within medicine.
Daniel Barth-Jones (@dbarthjones), Columbia University and expert in Data Privacy and De-identification Policy, asked in another tweet and referenced a very interesting blog post from Harvard Law School on Ethical Concerns, Conduct and Public Policy for Re-Identification and De-identification Practice
"When re-identification risks are exaggerated, we need to recognize that the resulting fears cause needless harms. Such fears can push us toward diminishing our use of properly de-identified data, or distorting the accuracy of our statistical methods because we’ve engaged in ill-motivated de-identification and have altered data even in cases where there was not anything more than de minimis re-identification risks."
From the same blog post from the Online Symposium on the Law, Ethics & Science of Re-identification Demonstrations, at the Bill of Health at Harvard Law School, in the fields of health law policy, biotechnology, and bioethics.
“We must achieve an ethical equipoise between potential privacy harms and the very real benefits that result from the advancement of science and healthcare improvements which are accomplished with de-identified data."
There were also a couple of interesting #PACCR tweets on the topic of Informed Consent quoting Sharon Terry (@sharonfterry), CEO of Genetic Alliance:

I would like to learn more about this thinking and how they potentially could be realized by:
Structuring and formalizing the Informed Consent content to become a semantic rich, and machine-executable, contract/policy for transparency and accountability in using clinical trial data. 
For more information see:

I do find all of this very interesting. And I hope such a "dynamic, granular, matrixed and contextual" approach can be part of new clinical trial data transparency policies:  
"To find solutions that are 'good enough' and provide both dramatic privacy protections and useful analytic data" (from the same blog post).

Monday, October 7, 2013

The future of CDISC CT:s

A poll posted by Lex Jansen (@lexjansen) in the LinkedIN group for CDISC (Clinical Data Interchange Standards Consortium) triggered me to write down some thoughts on the future of CDISC's so called Controlled Terminologies (CT:s):

When you import CDISC Controlled Terminology from NCI EVS at or, which format do you use?
  (Excel, Text, ODM XML, or OWL/RDF)

My vote goes to the formats with the best potential for the future, that is the formats serializing RDF modeled data e.g. turtle, json, n-triples, json and xml (See the blog post: Understanding RDF serialisation formats)

Today's RDF version

The recently published OWL/RDF version of the CT:s (serialized in xml) uses the first version of the CDISC2RDF schema 1) implementing the model behind the existing export of a limit part of  the content in NCI Thesaurus (NCIt). 

It is modeled to support today's use of the CT:s only as text strings to populate variables in CDISC defined data sets (e.g. SDTM domains) with submission values.That is, it provide study specific clarity making it easy for humans to read the clinical data and metadata.

Next RDF version

Based on very useful discussions with the terminology expert Julie James (LinkedIn profile) working for HL7, IMI EHR4CR and FDA/PhuSE Metadata definition project, these are my thoughts for the next RDF version:

To provide cross study semantic interoperability making it easy for machines to directly integrate and query clinical data and metadata across health care and clinical research we need an enhanced model.

That is, a model that fully leverage the content in NCIt. And address the issues people have experienced when using the CT:s in attempts to implement them in BRIDG / ISO21090. Using the insights from the IMI EHR4CR project and from the development of the IHE DEX profile (Data Element Exchange).

I think there is also an opportunity to leverage the work on binding value sets to data elements part of the HL7 FHIR (Fast Healthcare Interoperability Resources) development 2). Julie also pointed me to a new ISO standards: ISO/CD 17583 3) The next version should also apply both the OID (Object identfier) standard and the URI (Uniform Resource Identifier) standard to identify each value set and each value.

1)  CDISC2RDF poster (presented at DILS 2013, Data Integration in Life Science conference) and FDA/PhUSE Semantic Technology project 
3) ISO/CD 17583: Health informatics -- Terminology constraints for coded data elements expressed in (ISO 21090) Harmonized Data Types used in healthcare information interchange.

Friday, September 13, 2013

Justifications of Mappings

A common theme in the Semantic Trilogy events in Montreal this summer (see Semantic Trilogy preparations and Semantic Trilogy report part 1) was mappings such as the mappings provided via the NCBO BioPortal

For example the mappings in the Bioportal expressed as skos:closeMatch are the result of using the LOOM lexical algorithm. Examples of not so good mappings, such as this one, were highlighted:

<NCI Thesaurus: Chairperson (subclass to Person)> 
<Int. Classification for Patient Safety: Chair (subclass to Piece of Furniture)>

One view was: ‘Don’t use them!’ (tweet). Another view was “Give us the justification of the mappings so we can decide when it makes sense to use them.”

Mappings in chemical informatics

When I came back from the Semantic Trilogy and read about mappings, or linksets as they are called, in the new version of the Open PHACTS specification "Dataset Descriptions for the Open Pharmacological Space" I saw some opportunities to make mappings more explicit and hence more useful.

I think the editor, Alasdair Gray (@gray_alasdair), and the whole team of authors, have done a great job on this specification.
"The Dataset Descriptions for the Open Pharmacological Space is a specification for the metadata to described datasets, and the linksets that relate them, to enable their use within the Open PHACTS discovery platform. The specification defines the metadata properties that are expected to describe datasets and linksets; detailing the creation and publication of the dataset."
I especially liked the part on making the justification of mappings explicit. For example, what is the justification behind stating that there is a close match (skos:closeMatch), or exact match (skos:exactMatch), between what is described in two different chemical datasets, such as the RDF datasets sourced from ChemSpider and ChEMBL.

The figure depicts four distinct linksets: two sourced from ChemSpider
depicted in blue which use different link predicates; one sourced from ChEMBL
depicted in red; and one sourced from a third party depicted in green.
My understanding is that for the chemical informatics community the Open PHACTS specification will establish a vocabulary to express the justifications for links/mappings between chemical entities. This enables them to explicitly state justifications such as "Has isotopically unspecified parent" or "Have the same InChI key" (see B.2 Link Justification Vocabulary Terms to also get the URIs for these terms).

Mappings between medical terminologies

Together with members of the EU projects EHR4CR and SALUS, MedDRA MSSO, and W3C HCLS, I am now exploring the idea of establishing a similar approach for the medical terminology community. That is, a vocabulary of terms to express the justifications for different mappings between concepts/terms in terminologies across healthcare and clinical research, such as ICD9, SNOMED CT and MedDRA.

This is part of a broader discussion on the use of terminologies in semantic web focused environments, with formal representations in RDF of both the terminologies themselves and of the mappings between them. Here's an example of a visualization from such a formal representations of MedDRA and SNOMED-CT terms and mappings between them in SKOS/RDF.

From: SALUS Harmonized Ontology for Post Market Safety Studies