HCLS Community Profile for Dataset Descriptions

My latest publication [1] describes the process followed in developing the W3C Health Care and Life Sciences Interest Group (HCLSIG) community profile for dataset descriptions which was published last year. The diagram below provides a summary of the data model for describing datasets which covers 61 metadata terms drawn from 18 vocabularies. [1] M. Dumontier, A. […]

My latest publication [1] describes the process followed in developing the W3C Health Care and Life Sciences Interest Group (HCLSIG) community profile for dataset descriptions which was published last year. The diagram below provides a summary of the data model for describing datasets which covers 61 metadata terms drawn from 18 vocabularies.Overview of the HCLS Community Profile for Dataset Descriptions

[1] [doi] M. Dumontier, A. J. G. Gray, S. M. Marshall, V. Alexiev, P. Ansell, G. Bader, J. Baran, J. T. Bolleman, A. Callahan, J. Cruz-Toledo, P. Gaudet, E. A. Gombocz, A. N. Gonzalez-Beltran, P. Groth, M. Haendel, M. Ito, S. Jupp, N. Juty, T. Katayama, N. Kobayashi, K. Krishnaswami, C. Laibe, N. {Le Novère}, S. Lin, J. Malone, M. Miller, C. J. Mungall, L. Rietveld, S. M. Wimalaratne, and A. Yamaguchi, “The health care and life sciences community profile for dataset descriptions,” PeerJ, vol. 4, p. e2331, 2016.
[Bibtex]
@article{Dumontier2016HCLS,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the {W3C} Semantic Web for Health Care and the Life Sciences Interest Group ({HCLSIG}) identified Resource Description Framework ({RDF}) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of {FAIR} data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.},
author = {Dumontier, Michel and Gray, Alasdair J.G. and Marshall, M Scott and Alexiev, Vladimir and Ansell, Peter and Bader, Gary and Baran, Joachim and Bolleman, Jerven T and Callahan, Alison and Cruz-Toledo, Jos{'{e}} and Gaudet, Pascale and Gombocz, Erich A and Gonzalez-Beltran, Alejandra N. and Groth, Paul and Haendel, Melissa and Ito, Maori and Jupp, Simon and Juty, Nick and Katayama, Toshiaki and Kobayashi, Norio and Krishnaswami, Kalpana and Laibe, Camille and {Le Nov{`{e}}re}, Nicolas and Lin, Simon and Malone, James and Miller, Michael and Mungall, Christopher J and Rietveld, Laurens and Wimalaratne, Sarala M and Yamaguchi, Atsuko},
doi = {10.7717/peerj.2331},
issn = {2167-8359},
journal = {PeerJ},
month = aug,
title = {The health care and life sciences community profile for dataset descriptions},
volume = {4},
pages = {e2331},
year = {2016},
url = {https://peerj.com/articles/2331/}
}

Open PHACTS Closing Symposium

For the last 5 years I have had the pleasure of working with the Open PHACTS project. Sadly, the project is now at an end. To celebrate we are having a two day symposium to look over the contributions of the project and its future legacy. The project has been hugely successful in developing an […]

For the last 5 years I have had the pleasure of working with the Open PHACTS project. Sadly, the project is now at an end. To celebrate we are having a two day symposium to look over the contributions of the project and its future legacy.

The project has been hugely successful in developing an integrated data platform to enable drug discovery research (see a future post for details to support this claim). The result of the project is the Open PHACTS Foundation which will now own the drug discovery platform and sustain its development into the future.

Here are my slides on the state of the data in the Open PHACTS 2.0 platform.

Data Integration in a Big Data Context

Today I had the pleasure of visiting the Urban Big Data Centre (UDBC) to give a seminar on Data Integration in a Big Data context (slides below). The idea for the seminar came about due to my collaboration with Nick Bailey (Associate Director of the UBDC) in the Administrative Research Data Centre for Scotland (ADRC-S). In […]

Today I had the pleasure of visiting the Urban Big Data Centre (UDBC) to give a seminar on Data Integration in a Big Data context (slides below). The idea for the seminar came about due to my collaboration with Nick Bailey (Associate Director of the UBDC) in the Administrative Research Data Centre for Scotland (ADRC-S).

In the seminar I wanted to highlight the challenges of data integration that arise in a Big Data context and show examples from my past work that would be relevant to those in the UBDC. In the presentation, I argue that RDF provides a good approach for data integration but it does not solve the basic challenges of messy data and generating mappings between datasets. It does however lay these challenges bare on the table, as Frank van Harmelen highlighted in his SWAT4LS keynote in 2013.

The first use case is drawn from my work on the EU SemSorGrid4Env project where we were developing an integrated view for emergency response planning. The particular use case shown is that of coastal flooding on the south coast of England. Although this project finished in 2011, I am still involved with developing RDF and SPARQL continuous data extensions; see the W3C RDF Stream Processing Community Group for details.

The second use case is drawn from my work on the EU Open PHACTS project. I showed the approach we developed for supporting user controlled views of the integrated data through Scientific Lenses. However, I also talked about the successes of the project and the fact that is currently being actively used for pharmacology research and receiving over 20million hits a month.

I finished the talk with an overview of the Administrative Data Research Centre for Scotland (ADRC-S) and my work on linking birth, marriage, and death records. I am hoping that we can adopt the lenses approach together with incorporating feedback on the linkages from the researchers who will use the integrated views.

In the discussions following the talk, the notion of FAIR data came up. This is the idea that data should be Findable, Accessible, Interoperable, and Reusable by both humans and machines. RDF is one approach that could lead to this. The other area of discussion was around community initiatives for converting existing open datasets into an RDF format. I advocated adopting the approach followed by the Bio2RDF community who share the tasks of creating and maintaining such scripts for biological datasets. An important part of this jigsaw is tracking the provenance of the datasets, for which the W3C Health Care and Life Sciences Community Profile for Dataset Descriptions could be beneficial (there is nothing specific to the HCLS community in the profile).

W3C HCLS Dataset Descriptions Profile Published

After 3 years hard work, countless telephone conferences, issues and drafts, the W3C Health Cara and Life Sciences Community Group (HCLS) have finally published their community profile for describing datasets. The profile deals with different versions of a dataset with each version being published in multiple formats. Below is the announcement from the W3C. The Semantic […]

After 3 years hard work, countless telephone conferences, issues and drafts, the W3C Health Cara and Life Sciences Community Group (HCLS) have finally published their community profile for describing datasets. The profile deals with different versions of a dataset with each version being published in multiple formats. Below is the announcement from the W3C.

The Semantic Web Health Care and Life Sciences Interest Group has published a Group Note of Dataset Descriptions: HCLS Community Profile. Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval. Learn more about the Data Activity.

 

 

ISWC 2014

ISWC 2014 is taking place on the shores of Lake Garda, Italy. However, I won’t have much time to relax on the lake. Look out for my tweets (@gray_alasdair). My conference activities start on Sunday 19 October with the first workshop on Context, Interpretation and Meaning (CIM2014), which together with Harry Halpin (W3C) and Fiona […]

ISWC 2014 is taking place on the shores of Lake Garda, Italy. However, I won’t have much time to relax on the lake. Look out for my tweets (@gray_alasdair).

My conference activities start on Sunday 19 October with the first workshop on Context, Interpretation and Meaning (CIM2014), which together with Harry Halpin (W3C) and Fiona McNeill (Heriot-Watt University) I am a chair. We have managed to put together an interesting selection of 5 papers – two focusing on the context of links, two on the interpretation of alignments and one on the meaning of mappings. I am a co-author on this final paper, but Kerstin Forsberg will be presenting the work [1]. We also have an exciting panel session in store with Aldo Gangemi (CNR), Paul Groth (VU University of Amsterdam) and Harry Halpin.

Also taking place on Sunday is the Linked Science Workshop (LISC). Together with Simon Jupp and James Malone of the EBI we have a paper on modelling the provenance for linksets of convenience [2]. A linkset of convenience is one that does not model the underlying science correctly, but provides a convenient shortcut for linking data. An example from the world of biology is a linkset that directly links genes with their protein product.

On Monday I will be working with the W3C RDF Stream Processing (RSP) Community Group. We have been having regular phone meetings for the last year and have made great progress towards defining a common community model for RDF streams and a query language for processing them. The group will largely be attending the Stream Ordering Workshop and the Semantic Sensor Networks Workshop.

Tuesday is the first day of ISWC, and it is going to be a busy one for me. In the morning I will be presenting the Open PHACTS paper on our work enabling scientific lenses for chemistry data [3]. In the evening I will be at the poster and demonstration session showing off the Open PHACTS VoID Editor [4].

Finally, I am organising the Lightning Talks session on the last day of the conference. This is a session where you can present late breaking results or responses to work presented in the conference. Talks will be 5 minutes each and abstracts can be submitted until 8.30 am on Thursday.

After ISWC I think I’m going to need a break.

[1] S. Hussain, H. Sun, G. B. L. Erturkmen, M. Yuksel, C. Mead, A. J. G. Gray, and K. Forsberg, “A Justification-based Semantic Framework for Representing , Evaluating and Utilizing Terminology Mappings,” in Context. Interpret. Mean., Riva del Garda, Italy, 2014.
[Bibtex]
@inproceedings{Hussain2014CIM,
abstract = {Use of medical terminologies and mappings across them are consid- ered to be crucial pre-requisites for achieving interoperable eHealth applica- tions. However, experiences from several research projects have demonstrated that the mappings are not enough. Also the context of the mappings is needed to enable interpretation of the meaning of the mappings. Built upon these experi- ences, we introduce a semantic framework for representing, evaluating and uti- lizing terminology mappings together with the context in terms of the justifica- tions for, and the provenance of, the mappings. The framework offers a plat- form for i) performing various mappings strategies, ii) representing terminology mappings together with their provenance information, and iii) enabling termi- nology reasoning for inferring both new and erroneous mappings. We present the results of the introduced framework using the SALUS project where we evaluated the quality of both existing and inferred terminology mappings among standard terminologies.},
address = {Riva del Garda, Italy},
author = {Hussain, Sajjad and Sun, Hong and Erturkmen, Gokce Banu Laleci and Yuksel, Mustafa and Mead, Charles and Gray, Alasdair J G and Forsberg, Kerstin},
booktitle = {Context. Interpret. Mean.},
file = {:Users/Alasdair/Documents/Mendeley Desktop/2014/Hussain et al. - A Justification-based Semantic Framework for Representing , Evaluating and Utilizing Terminology Mappings.pdf:pdf},
title = {{A Justification-based Semantic Framework for Representing , Evaluating and Utilizing Terminology Mappings}},
year = {2014}
}
[2] S. Jupp, J. Malone, and A. J. G. Gray, “Capturing Provenance for a Linkset of Convenience,” in Proceedings of the 4th Workshop on Linked Science 2014 – Making Sense Out of Data (LISC2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 2014, pp. 71-75.
[Bibtex]
@inproceedings{Jupp2014,
address = {Riva del Garda, Italy},
author = {Jupp, Simon and Malone, James and Gray, Alasdair J G},
booktitle = {Proceedings of the 4th Workshop on Linked Science 2014 - Making Sense Out of Data (LISC2014)
co-located with the 13th International Semantic Web Conference (ISWC 2014)},
publisher = {CEUR},
month = oct,
volume = {1282},
pages = {71-75},
title = {{Capturing Provenance for a Linkset of Convenience}},
url = {http://ceur-ws.org/Vol-1282/lisc2014_submission_7.pdf},
year = {2014}
}
[3] [doi] C. R. Batchelor, C. Y. A. Brenninkmeijer, C. Chichester, M. Davies, D. Digles, I. Dunlop, C. T. A. Evelo, A. Gaulton, C. A. Goble, A. J. G. Gray, P. T. Groth, L. Harland, K. Karapetyan, A. Loizou, J. P. Overington, S. Pettifer, J. Steele, R. Stevens, V. Tkachenko, A. Waagmeester, A. J. Williams, and E. L. Willighagen, “Scientific Lenses to Support Multiple Views over Linked Chemistry Data,” in The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, 2014, pp. 98-113.
[Bibtex]
@inproceedings{iswc2014,
author = {Colin R. Batchelor and
Christian Y. A. Brenninkmeijer and
Christine Chichester and
Mark Davies and
Daniela Digles and
Ian Dunlop and
Chris T. A. Evelo and
Anna Gaulton and
Carole A. Goble and
Alasdair J. G. Gray and
Paul T. Groth and
Lee Harland and
Karen Karapetyan and
Antonis Loizou and
John P. Overington and
Steve Pettifer and
Jon Steele and
Robert Stevens and
Valery Tkachenko and
Andra Waagmeester and
Antony J. Williams and
Egon L. Willighagen},
title = {Scientific Lenses to Support Multiple Views over Linked Chemistry
Data},
booktitle = {The Semantic Web - {ISWC} 2014 - 13th International Semantic Web Conference,
Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part {I}},
month = oct,
year = {2014},
pages = {98--113},
url = {http://dx.doi.org/10.1007/978-3-319-11964-9_7},
doi = {10.1007/978-3-319-11964-9_7},
}
[4] C. Goble, A. J. G. Gray, and E. Tatakis, “Help me describe my data: A demonstration of the Open PHACTS VoID Editor,” in ISWC 2014 – Poster Demos, Riva del Garda, Italy, 2014, pp. 1-4.
[Bibtex]
@inproceedings{Goble2014,
address = {Riva del Garda, Italy},
author = {Goble, Carole and Gray, Alasdair J G and Tatakis, Eleftherios},
booktitle = {ISWC 2014 – Poster Demos},
month = oct,
pages = {1--4},
title = {{Help me describe my data: A demonstration of the Open PHACTS VoID Editor}},
year = {2014}
}

EUON Talk on Dataset Descriptions

Tomorrow I will be talking at the 1st European Ontology Network meeting (EUON) about the work I have been doing in the W3C Health Care and Life Sciences (HCLS) Interest Group on creating a community profile for describing datasets. The work on the HCLS Dataset Description Community Profile has been ongoing for two years now […]

Tomorrow I will be talking at the 1st European Ontology Network meeting (EUON) about the work I have been doing in the W3C Health Care and Life Sciences (HCLS) Interest Group on creating a community profile for describing datasets.

The work on the HCLS Dataset Description Community Profile has been ongoing for two years now and is just about to reach fruition. Please do read the latest Editors’ Draft and provide feedback.