ELIXIR

FAIRplus Newsletter 2

Below is the opening exert from the second FAIRplus Newsletter:

Though FAIRplus has been running for just six months, there is already a lot to talk about. Our two task-focused ‘Squads’ have booted up and begun the FAIRification of the first set of four pilot datasets, our industry partners in EFPIA organised the first ‘Bring Your Own Data’ workshop in London, and we’ve been busy explaining our goals and answering many questions from our stakeholders.

You can read about these activities in this second FAIRplus newsletter. On top of that, we bring you an update on upcoming events, news from our partners and also a new section ‘Track our progress’ where you can check for yourself how we are progressing towards our goals and what Deliverables and reports we’ve recently submitted.

Finally, we’ve launched our own LinkedIn page. Besides regular updates on our activities, it will also feature job opportunities and news from the FAIRplus partners.

The next FAIRplus Newsletter will come out in November 2019. In it we’ll present the FAIRplus Fellowship programme, report on the FAIR workshop in October and more.

We wish you a relaxing Summer and look forward to meeting you at our events!

Biohackathon 2018 -Paris

Bioschemas at the Biohackathon

Last November I had the privilege to be one of 150 participants at the Biohackathon organised by ELIXIR. The hackathon was organised into 29 topics, many of which were related to Bioschemas and one directly focused on Bioschemas. For the Bioschemas topic we had up to 30 people working around three themes.

The first theme was to implement markup for the various life sciences resources present. Representatives from ELIXIR Core Data Resources and node resources from the UK and Switzerland were there to work on this thanks to the staff exchange and travel fund. By the end of the week we had new live deploys for 11 additional resources and examples for many more.

The second theme was to refine the types and profiles that Bioschemas has been developing based on the experiences of deploying the markup. Prior to the hackathon, Bioschemas had moved from a minimal Schema.org extension of a single BioChemEntity type to collection of types for the different life science resources, e.g. Gene, Protein, and Taxon. Just before the hackathon a revised set of types and profiles were released. This proved to be useful for discussion, but it very quickly became clear that there was need for further refinement. During the hackathon we started new profiles for DNA, Experimental Studies, and Phenotype, and the Chemical profile was split into MolecularEntity and ChemicalSubstance. Long discussions were held about the types and their structure with early drafts for 17 types being proposed. These are now getting to a state where they are ready for further experimentation.

The third theme was to develop tooling to support Bioschemas. Due to the intensity of the discussions on the types and profiles, there was no time to work on this topic. However, the prototype Bioschemas Generator was extensively tested during the first theme and improvements fed back to the developer. There were also refinements made to the GoWeb tool.

Overall, it was a very productive hackathon. The venue proved to be very conducive to fostering the right atmosphere. During the evenings there were opportunities to socialise or carry on the discussions. Below are two of the paintings that were produced during one of the social activities that capture the Bioschemas discussions.

And there was the food. Wow! Wonderful meals, three times a day.

First steps with Jupyter Notebooks

At the 2nd Workshop on Enabling Open Semantic Sciences (SemSci2018), colocated at ISWC2018, I presented the following paper (slides at end of this post):

Title: Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources

Abstract: In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.

The paper covers my first attempt at using a computational notebook to publish a data analysis for reproducibility. The paper provokes more questions than it answers and this was the case in the workshop too.

One of the really great things about the paper is that you can launch the notebook, without installing any software, by clicking on the binder button below. You can then rerun the entire notebook and see whether you get the same results that I did when I ran the analysis over the various datasets.

UK Ontology Network 2018

This week I went to the UK Ontology Network meeting hosted at Keele University. There was an interesting array of talks in the programme showing the breadth of work going on in the UK.

I gave a talk on the Bioschemas Community  (slides below) and Leyla Garcia presented a poster providing more details of the current Bioschema Profiles.

The UK Ontology Network is going through a reflection phase and would like interested parties to complete the following online survey.

 

Bioschemas Samples Hackathon

Last week the Bioschemas Community hosted a workshop. The focus of the meeting was to get web resources describing biological samples to embed Schema.org mark up in their pages. The embedded mark up will enable the web resources to become more discoverable, and therefore the biological samples also.

I was not able to attend the event but Justin Clark-Casey has written this blog post summarising the event.