BioHackathon 2019

Petros and Alasdair hacking at the BioHackathon

I once again attended the European BioHackathon which took place in November outside of Paris. It was another intense week with 150 developers from across Europe (and beyond) working together on 34 topics.

Bioschemas was well represented in the topics and the hacking activities of the week. By the end of the week we had an approach for marking up Intrinsically Disordered Proteins and Rare Disease resources. We also had several resources with newly deployed or improved Bioschemas markup.

For a fuller overview of the event and outcomes, please see the ELIXIR news item.

Schema.org Dataset Descriptions Meeting

On Monday 16 May, several interested individuals (including myself) from ELIXIR, bioCADDIE, Bioschemas and the W3C HCLS Community Profile met with representatives from Google involved in the schema.org activity on describing datasets.

Finding datasets, and understanding their content, is a challenging task for humans and currently not possible to automate. Schema.org is an initiative from the major web search engines to help with the discovery of web resource.

There are multiple parallel activities in the life sciences community working on developing ways to publish metadata about datasets. This is due to the wide variety of use cases that dataset descriptions need to satisfy, including data discovery, data citation, and provenance tracking. bioCADDIE has worked on an extensive analysis of  use cases mapping data models from existing efforts. We agreed this could be used as a basis to improve the existing schema.org dataset type and come up with a new Bioschemas dataset specification.

The outcomes of the meeting with Google was to focus on the find-ability of datasets with an emphasis on data citation. For data citation the following key properties are important (as stated in the FORCE 11 data citation principles).


Image from https://www.force11.org/node/4771

The next steps will be to develop some pilot projects to both publish and use dataset descriptions for discovery and citation.

ELIXIR-UK members are heavily involved in Bioschemas.org and will be involved in these efforts as part of the ELIXIR interoperability platform activities in collaboration with NIH and ELIXIR-EBI.

My thanks go to Rafael Jimenez for his help with the preparation of this post which will also appear in the ELIXIR-UK newsletter.