Open PHACTS – Heriot-Watt Semantic Web Lab

SLiDInG 6

Today, the Semantic Web Lab hosted the 6th Scottish Linked Data Interest Group workshop at Heriot-Watt University. The event was sponsored by the SICSA Data Science Theme. The event was well attended with 30 researchers from across Scotland (and Newcastle) coming together for a day of flash talks and discussions. Live minutes were captured during the […]

I gave a talk on the successes and challenges of FAIR data. My slides are embedded below.

Smart Descriptions & Smarter Vocabularies (SDSVoc) Report

In December 2016 I presented at the Smart Descriptions and Smarter Vocabularies workshop on the Health Care and Life Sciences Community Profile for describing datasets, and our validation tool (Validata). Presentations included below. The purpose of the workshop was to understand current practice in describing datasets and where the DCAT vocabulary needs improvement. Phil Archer has written a very […]

The purpose of the workshop was to understand current practice in describing datasets and where the DCAT vocabulary needs improvement. Phil Archer has written a very comprehensive report covering the workshop. A charter is being drawn up for a W3C working group to develop the next iteration of the DCAT vocabulary.

The HCLS Community Profile: Describing Datasets, Versions, and Distributions from Alasdair Gray

Validata: A tool for testing profile conformance from Alasdair Gray

ISWC 2016 Trip Report

It has now been almost two months since ISWC 2016 where I was the Resources Track chair with Marta Sabou. This has given me time to reflect on the conference, in between a hectic schedule of project meetings, workshops, conferences, and a PhD viva. The most enjoyable part of the conference for me was the […]

The most enjoyable part of the conference for me was the CoLD Workshop Debate on the State of Linked Data. The workshop organisers had arranged for six prominent proponents of the Linked Data to argue that we have failed and that Linked Data will die away.

Ruben Verborgh argued that Linked Data will be destroyed by the need to centralise data, poor infrastructure, and the research community. (Aside: There was certainly concern on the final point as there were only three females in the room.)
Axel Polleres took the moto, “Let’s make RDF great again!” Axel’s central argument was around the fact that most open data is actually published in CSV format and lots can be achieved with 3* open data.

“Let’s Make RDF great again” – @AxelPolleres #cold2016 #iswc2016 pic.twitter.com/BuA4sawONO

— Juan Sequeda (@juansequeda) October 18, 2016
Paul Groth argued that we should be concentrating on making our data processable by machines. What we currently have is a format that is aimed at both but satisfies neither.

Wow 12GB of sensor data becomes 310GB of triples! What is going on here? #SWIT #iswc2016

— Alasdair J G Gray (@gray_alasdair) October 17, 2016

we should really be using scientific data types, like vectors, matrices, etc … https://t.co/kuDvJfOonW

— Egon Willighⓐgen (@egonwillighagen) October 17, 2016
Chris Bizer covered cost incentives. While there is an incentive to provide some basic schema markup on pages, i.e. getting picked up by search engines, there is no financial incentive to provide the links to other resources. My take on this is that there is a disincentive as it would take traffic away from your (eCommerce) site and therefore lose you revenue.
Avi Bernstein then did a fantastic impression of a Wee Free minister and telling us that we had all sinned and were following the wrong path; all fire and brimstone.
Juan Reutter argued that we needed to provide a workable ecosystem.

So the question is, has the Linked Data community failed? I think the debate highlighted that the community had made many contributions in a short space of time but that it is time to get this into the main stream. Perhaps our community is not the best for doing the required sales job, but we have had some success, e.g. EBI RDF platform, Open PHACTS Drug Discovery Platform, BBC Olympic Web Site.

The main conference was underpinned by three fantastic and varied keynotes. First was Kathleen McKeown who gave us insights into the extraction of knowledge from different forms of text. Second was Christian Bizer who’s main message was that we as a community need to take structured data in whatever form it comes; just like search engines have exploited metadata and page structure for a long time. Finally was Hiroaki Kitano from the Sony Corporation. This has got to be the densest keynote I have ever heard with more ideas per minute than a dance tune has beats. His challenge to the community was that we should aim to have an AI system win a scientific nobel prize by 2050. The system should develop a hypothesis, test it, and generate a ground breaking conclusion worthy of the prize.

A new grand AI challenge – Kitano @SonyCSL #iswc2016 pic.twitter.com/Hoiqai3Hmq

— Paul Groth (@pgroth) October 21, 2016

There were many great and varied talks during the conference. It really is worth looking through the programme to find those of interest to you (all the papers are linked and available). As ever the poster and demo session, advertised in the minute madness session, demonstrated the breadth and cutting edge work going on in the community. As did the lightning talk session.

The final day of the conference was particularly weird for me. As the chair of a session I ended up sharing a bottle of fine Italian wine with a presenter during his talk, it would have been rude not to; and experiencing an earthquake during a presentation on an ontology for modelling the soil beneath our cities, in particular the causes of damage to that soil.

The conference afforded some opportunities for fun as well. A few of the organising committee managed to get visit the k-computer; the worlds fifth fastest super-computer which is cooled with water. The computer was revealed in a very James Bond, “Now I’m going to have to kill you!” reveal of the evil enemy’s master plan. There was also a highly entertaining Samurai sword fighting demonstration during the conference banquet.

Thanks to the sword fighters and to the organizers for arranging such a great show tonight! @ISWC2016 #iswc2016 pic.twitter.com/ebFQg1IUJ1

— Ewa Kowalczuk (@EwaAKowalczuk) October 20, 2016

During the conference, my Facebook feed was filled with exclamations about the complexity of the toilets. Following the conference, it was filled with exclamations of returning to lands of uncivilised toilets. Make of this what you will.

HCLS Tutorial at SWAT4LS 2016

On 5 December 2016 I presented a tutorial [1] on the Heath Care and Life Sciences Community Profile (HCLS Datasets) at the 9th International Semantic Web Applications and Tools for the Life Sciences Conference (SWAT4LS 2016). Below you can find the slides I presented. The 61 metadata properties from 18 vocabularies reused in the HCLS Community […]

The 61 metadata properties from 18 vocabularies reused in the HCLS Community Profile are available in this spreadsheet (.ods).

Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile from Alasdair Gray

[1] M. Dumontier, A. J. G. Gray, and S. M. Marshall, “Describing Datasets with the Health Care and Life Sciences Community Profile,” in Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016), Amsterdam, The Netherlands, 2016.
[Bibtex]

@InProceedings{Gray2016SWAT4LSTutorial,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.},
author = {Michel Dumontier and Alasdair J. G. Gray and M. Scott Marshall},
title = {Describing Datasets with the Health Care and Life Sciences Community Profile},
OPTcrossref = {},
OPTkey = {},
booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016)},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = dec,
address = {Amsterdam, The Netherlands},
OPTorganization = {},
OPTpublisher = {},
note = {(Tutorial)},
url = {http://www.swat4ls.org/workshops/amsterdam2016/tutorials/t2/},
OPTannote = {}
}

HCLS Tutorial at SWAT4LS 2016

The 61 metadata properties from 18 vocabularies reused in the HCLS Community Profile are available in this spreadsheet (.ods).

Tutorial: Describing Datasets with the Health Care and Life Sciences Community Profile from Alasdair Gray

@InProceedings{Gray2016SWAT4LSTutorial,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.},
author = {Michel Dumontier and Alasdair J. G. Gray and M. Scott Marshall},
title = {Describing Datasets with the Health Care and Life Sciences Community Profile},
OPTcrossref = {},
OPTkey = {},
booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016)},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = dec,
address = {Amsterdam, The Netherlands},
OPTorganization = {},
OPTpublisher = {},
note = {(Tutorial)},
url = {http://www.swat4ls.org/workshops/amsterdam2016/tutorials/t2/},
OPTannote = {}
}

HCLS Community Profile for Dataset Descriptions

My latest publication [1] describes the process followed in developing the W3C Health Care and Life Sciences Interest Group (HCLSIG) community profile for dataset descriptions which was published last year. The diagram below provides a summary of the data model for describing datasets which covers 61 metadata terms drawn from 18 vocabularies. [1] M. Dumontier, A. […]

[1]

M. Dumontier, A. J. G. Gray, S. M. Marshall, V. Alexiev, P. Ansell, G. Bader, J. Baran, J. T. Bolleman, A. Callahan, J. Cruz-Toledo, P. Gaudet, E. A. Gombocz, A. N. Gonzalez-Beltran, P. Groth, M. Haendel, M. Ito, S. Jupp, N. Juty, T. Katayama, N. Kobayashi, K. Krishnaswami, C. Laibe, N. {Le Novère}, S. Lin, J. Malone, M. Miller, C. J. Mungall, L. Rietveld, S. M. Wimalaratne, and A. Yamaguchi, “The health care and life sciences community profile for dataset descriptions,” PeerJ, vol. 4, p. e2331, 2016.
[Bibtex]

@article{Dumontier2016HCLS,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the {W3C} Semantic Web for Health Care and the Life Sciences Interest Group ({HCLSIG}) identified Resource Description Framework ({RDF}) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of {FAIR} data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.},
author = {Dumontier, Michel and Gray, Alasdair J.G. and Marshall, M Scott and Alexiev, Vladimir and Ansell, Peter and Bader, Gary and Baran, Joachim and Bolleman, Jerven T and Callahan, Alison and Cruz-Toledo, Jos{'{e}} and Gaudet, Pascale and Gombocz, Erich A and Gonzalez-Beltran, Alejandra N. and Groth, Paul and Haendel, Melissa and Ito, Maori and Jupp, Simon and Juty, Nick and Katayama, Toshiaki and Kobayashi, Norio and Krishnaswami, Kalpana and Laibe, Camille and {Le Nov{`{e}}re}, Nicolas and Lin, Simon and Malone, James and Miller, Michael and Mungall, Christopher J and Rietveld, Laurens and Wimalaratne, Sarala M and Yamaguchi, Atsuko},
doi = {10.7717/peerj.2331},
issn = {2167-8359},
journal = {PeerJ},
month = aug,
title = {The health care and life sciences community profile for dataset descriptions},
volume = {4},
pages = {e2331},
year = {2016},
url = {https://peerj.com/articles/2331/}
}

Open PHACTS is dead, long live Open PHACTS!

I have spent the last five years working on the Open PHACTS project which is sadly at an end. However it is not the end of the Open PHACTS drug discovery platform. We have transitioned to a new era of a foundation organisation running and developing the platform. The milestone was marked by the symbolic handover of the Open PHACTS flag (see photo of on the right Barend Mons (Leiden Medical Center) and Gerhard Ecker (University of Vienna) handing the flag to on the left Stefan Senger (GlaxoSmithKline), Derek Marren (Eli Lilly), and Herman van Vlijmen (Janssen Pharmaceutica).

A nice summary of the closing symposium is available:

Linking Life Science Data: Design to Implementation, and Beyond

19 Feb, 2016 Open PHACTS project closing conference (Vienna, Austria)

On 18–19 February, 2016, we celebrated the completion of the Open PHACTS project with a conference at the University of Vienna, Austria. A total of 79 people attended to discuss the achievements of the Open PHACTS project, what they mean for the future of linked data, and how they can be carried forward.

Source: Linking Life Science Data: Design to Implementation, and Beyond – Open PHACTS Foundation

Open PHACTS Closing Symposium

For the last 5 years I have had the pleasure of working with the Open PHACTS project. Sadly, the project is now at an end. To celebrate we are having a two day symposium to look over the contributions of the project and its future legacy. The project has been hugely successful in developing an […]

The project has been hugely successful in developing an integrated data platform to enable drug discovery research (see a future post for details to support this claim). The result of the project is the Open PHACTS Foundation which will now own the drug discovery platform and sustain its development into the future.

Here are my slides on the state of the data in the Open PHACTS 2.0 platform.

Open PHACTS: The Data Today from Alasdair Gray

Crusade for Big Data Keynote

Today I gave the keynote presentation (slides below) at the Crusade for Big Data in the AAL domain workshop as part of the EU Ambient Assisted Living Forum. I gave an overview of the way that the Open PHACTS project has overcome various Big Data challenges to provide a production quality data integration platform that is […]

The workshop then broke out into five breakout groups to discuss open challenges facing the AAL community that are posed by Big Data. The breakout groups were:

Privacy and Ethics
Business models for sustainability
Data reuse and interoperability
Data quality
Feedback to the users

The organisers of the workshop (Femke Ongenae and Femke De Backere) will be sharing the outcomes of the brainstorming by proposing several working groups to focus on the issues in the area of AAL.

Data Integration in a Big Data Context: An Open PHACTS Case Study from Alasdair Gray

Open PHACTS wins European Linked Data Award

We are delighted to announce that Open PHACTS has been awarded first place in the Linked Open Data Award of the inaugural European Linked Data Contest (ELDC). An international jury of ambassadors from over 15 European countries elected Open PHACTS as the winner, judged by the following criteria: Shows a high degree of innovation triggers […]

Gerhard receiving ELDC award

Shows a high degree of innovation
triggers network effects
embraces open standards
proves technological matureness
shows great potential to be utilised in multiple domains
achieves a high degree of comprehensibility for the users

The ELDC has been established to recognise Europe’s crème de la crème of linked data and semantic web. Prizes are awarded to stories, products, projects or persons presenting novel and innovative projects, products and industry implementations involving linked data. The ELDC also aims to build a directory of the best European projects in the domains of linked data and the semantic web. Open PHACTS is honored to be chosen as the first winner of the ELDC’s Linked Open Data Award, and to be included in this directory.