Presentation

ISWC 2018

ISWC 2018 Trip Report

Keynotes

There were three amazing and inspiring keynote talks, all very different from each other.

The first was given by Jennifer Golbeck (University of Maryland). While Jennifer did her PhD on the Semantic Web in the early days of social media and Linked Data, she now focuses on user privacy and consent. These are highly relevant topics to the Semantic Web community and something that we should really be considering when linking people’s personal data. While the consequences of linking scientific data might not be as scary, there are still ethical issues to consider if we do not get it right. Check out her TED talk for an abridged version of her keynote.

She also suggested that when reading a companies privacy policy, you should replace the word “privacy” with “consent” and see how it seems then.

The talk also struck an accord with the launch of the SOLID framework by Tim Berners-Lee. There was a good sales pitch of the SOLID framework from Ruben Verborgh in the afternoon of the Decentralising the Semantic Web Workshop.

The second was given by Natasha Noy (Google). Natasha talked about the challenges of being a researcher and engineering tools that support the community. Particularly where impact may only be detect 6 to 10 years down the line. She also highlighted that Linked Data is only a small fraction of the data in the world (the tip of the iceberg), and it is not appropriate to expect all data to become Linked Data.

Her most recent endeavour has been the Google Dataset Search Tool. This has been a major engineering and social endeavour; getting schema.org markup embedded on pages and building a specialist search tool on top of the indexed data. More details of the search framework are in this blog post. The current search interface is limited due to the availability of metadata; most sites only make title and description available. However, we can now start investigating how to return search results for datasets and what additional data might be of use. This for me is a really exciting area of work.

Later in the day I attended a talk on the LOD Atlas, another dataset search tool. While this gives a very detailed user interface, it is only designed for Linked Data researchers, not general users looking for a dataset.

The third keynote was given by Vanessa Evers (University of Twente, The Netherlands). This was in a completely different domain, social interactions with robots, but still raised plenty of questions for the community. For me the challenge was how to supply contextualised data.

Knowledge Graph Panel

The other big plenary event this year was the knowledge graph panel. The panel consisted of representatives from Microsoft, Facebook, eBay, Google, and IBM, all of whom were involved with the development of Knowledge Graphs within their organisation. A major concern for the Semantic Web community is that most of these panelists were not aware of our community or the results of our work. Another concern is that none of their systems use any of our results, although it sounds like several of them use something similar to RDF.

The main messages I took from the panel were

  • Scale and distribution were key

  • Source information is going to be noisy and challenging to extract value from

  • Metonymy is a major challenge

This final point connects with my work on contextualising data for the task of the user [bibcite key=BatchelorBCDDDEGGGGHKLOPSSTWWW14,Gray14] and has reinvigorated my interest in this research topic.

Final Thoughts

This was another great ISWC conference, although many familiar faces were missing.

There was a great and vibrant workshop programme. My paper [bibcite key=Gray2018:jupyter:SemSci2018] was presented during the Enabling Open Semantic Science workshop (SemSci 2018) and resulted in a good deal of discussion. There were also great keynotes at the workshop from Paul Groth (slides) and Yolanda Gil which I would recommend anyone to look over.

I regret not having gone to more of the Industry Track sessions. The one I did make was very inspiring to see how the results of the community are being used in practice, and to get insights into the challenges faced.

The conference banquet involved a walking dinner around the Monterey Bay Aquarium. This was a great idea as it allowed plenty of opportunities for conversations with a wide range of conference participants; far more than your standard banquet.

Here are some other takes on the conference:

I also managed to sneak off to look for the sea otters.

First steps with Jupyter Notebooks

At the 2nd Workshop on Enabling Open Semantic Sciences (SemSci2018), colocated at ISWC2018, I presented the following paper (slides at end of this post):

Title: Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources

Abstract: In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.

The paper covers my first attempt at using a computational notebook to publish a data analysis for reproducibility. The paper provokes more questions than it answers and this was the case in the workshop too.

One of the really great things about the paper is that you can launch the notebook, without installing any software, by clicking on the binder button below. You can then rerun the entire notebook and see whether you get the same results that I did when I ran the analysis over the various datasets.

SLiDInG 6

Today, the Semantic Web Lab hosted the 6th Scottish Linked Data Interest Group workshop at Heriot-Watt University. The event was sponsored by the SICSA Data Science Theme. The event was well attended with 30 researchers from across Scotland (and Newcastle) coming together for a day of flash talks and discussions. Live minutes were captured during the day and can be found here.

I gave a talk on the successes and challenges of FAIR data. My slides are embedded below.

UK Ontology Network 2018

This week I went to the UK Ontology Network meeting hosted at Keele University. There was an interesting array of talks in the programme showing the breadth of work going on in the UK.

I gave a talk on the Bioschemas Community  (slides below) and Leyla Garcia presented a poster providing more details of the current Bioschema Profiles.

The UK Ontology Network is going through a reflection phase and would like interested parties to complete the following online survey.

 

An Identifier Scheme for the Digitising Scotland Project

The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland’s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers.

The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Centre – Scotland).

My coauthors are:

  • Özgür Akgün, University of St Andrews
  • Ahamd Alsadeeqi, Heriot-Watt University
  • Peter Christen, Australian National University
  • Tom Dalton, University of St Andrews
  • Alan Dearle, University of St Andrews
  • Chris Dibben, University of Edinburgh
  • Eilidh Garret, University of Essex
  • Graham Kirby, University of St Andrews
  • Alice Reid, University of Cambridge
  • Lee Williamson, University of Edinburgh

The work reported in this talk is the result of the Digitising Scotland Raasay Retreat. Also at the retreat were:

  • Julia Jennings, University of Albany
  • Christine Jones
  • Diego Ramiro-Farinas, Centre for Human and Social Sciences (CCHS) of the Spanish National Research Council (CSIC)