Monday, April 9, 2012

Describe things vs Improve markup of pages that describe things

Easter Monday is a public holiday in Sweden and it's been a rainy and cold day -- so, it's time to write a new blog post. It's triggered by a nice blog post published just before the weekend by Phil Archer (@philarcher1) with the interesting title: Danbri has moved on – should we follow? In his blog post Phil reflects on a presentation Dan Brickley (@danbri) did the week before at a Linked Data meetup in London.

Phil focus on Dan's point about the the best practice so far in the semantic web community: "look at existing vocabularies, particularly ones that are already widely used and stable, and re-use as much as you can. Dublin Core, FOAF – you know the ones to use." 

And Phil wonder if it's now time to move on and "embrace schema.org as the vocabulary to use wherever possible? It won't cover everything, but it might cover the 50% of classes and properties that dominate any domian of interest." In his presentation, Schema.org and One Hundred Years of Search, Dan also argues that search terms have barely changed in style for 100 years and more.

For more info about the joint vocabulary from Google, Bing (Microsoft) and Yahoo called schema.org, see my remote report from the SemTech 2011 conference


Improve markup of pages that describe things

When listening to the video with Dan I did find this statement in his slides very interesting (on slide 33) decribing the scope of the schema.org vocabulary as "In-page structured data for search":
"Not asking an unconstrained 'so, how do we describe cars?', but “how can we improve markup on existing pages that describe cars?” (or Comics, SoftwareApps, Sports, ...)".
I always like when someone cleary state what is not included -- what's not intended. So, this is a helpful statement for me. And it will be interesting to follow how Schema.org will be extended and refined for domains such as Medicine/Health, see the list of Schema.org proposals maintained by W3C Web Schemas.

At the same time, a lot of the semantics I look for in my daily work is more about "how to describe cars?". Well, not cars really -- it's about other kinds of 'things' and their parts, relationsships and impacts on each other. It's about "how to describe 'things' in small portions of the biological, chemical, clinical and heath economic reality studied in clinical research and documented in health care". Also, "how to organise data about these 'things' not only to improve search but also to improve how data about these entities can be combined and queried in new ways." 

Describe things
This is also the driver for me to learn more about how to: "capture, in a logical, systematic way, what scientists regard as the basic truths about a topic. Like equations in physics or axioms in mathematics, they can even be the basis for computational models." from More than Words. See also several of my erlier blog post on this approach, for example my post on Disease terminologies and ontologies.

In a future blog post I hope to learn more about how this approach has been applied on the Chemical Information Ontology (ChemInfo) to describe Chemical Entities of Biological Interest (chEBI). This is nicely explained in one of the favorite papers I have collected in my (kerfors) CiteULike library: The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web by Janna Hastings, Leonid Chepelev, Egon Willighagen, Nico Adams, Christoph Steinbeck, Michel Dumontier.

Exempel from Chemical Entities of Biological Interest (CHEBI)
the entity Hemoglobin in html view via the ontology-browser Ontobee