Linked Data

ISWC 2018

ISWC 2018 Trip Report

Keynotes

There were three amazing and inspiring keynote talks, all very different from each other.

The first was given by Jennifer Golbeck (University of Maryland). While Jennifer did her PhD on the Semantic Web in the early days of social media and Linked Data, she now focuses on user privacy and consent. These are highly relevant topics to the Semantic Web community and something that we should really be considering when linking people’s personal data. While the consequences of linking scientific data might not be as scary, there are still ethical issues to consider if we do not get it right. Check out her TED talk for an abridged version of her keynote.

She also suggested that when reading a companies privacy policy, you should replace the word “privacy” with “consent” and see how it seems then.

The talk also struck an accord with the launch of the SOLID framework by Tim Berners-Lee. There was a good sales pitch of the SOLID framework from Ruben Verborgh in the afternoon of the Decentralising the Semantic Web Workshop.

The second was given by Natasha Noy (Google). Natasha talked about the challenges of being a researcher and engineering tools that support the community. Particularly where impact may only be detect 6 to 10 years down the line. She also highlighted that Linked Data is only a small fraction of the data in the world (the tip of the iceberg), and it is not appropriate to expect all data to become Linked Data.

Her most recent endeavour has been the Google Dataset Search Tool. This has been a major engineering and social endeavour; getting schema.org markup embedded on pages and building a specialist search tool on top of the indexed data. More details of the search framework are in this blog post. The current search interface is limited due to the availability of metadata; most sites only make title and description available. However, we can now start investigating how to return search results for datasets and what additional data might be of use. This for me is a really exciting area of work.

Later in the day I attended a talk on the LOD Atlas, another dataset search tool. While this gives a very detailed user interface, it is only designed for Linked Data researchers, not general users looking for a dataset.

The third keynote was given by Vanessa Evers (University of Twente, The Netherlands). This was in a completely different domain, social interactions with robots, but still raised plenty of questions for the community. For me the challenge was how to supply contextualised data.

Knowledge Graph Panel

The other big plenary event this year was the knowledge graph panel. The panel consisted of representatives from Microsoft, Facebook, eBay, Google, and IBM, all of whom were involved with the development of Knowledge Graphs within their organisation. A major concern for the Semantic Web community is that most of these panelists were not aware of our community or the results of our work. Another concern is that none of their systems use any of our results, although it sounds like several of them use something similar to RDF.

The main messages I took from the panel were

  • Scale and distribution were key

  • Source information is going to be noisy and challenging to extract value from

  • Metonymy is a major challenge

This final point connects with my work on contextualising data for the task of the user [1, 2] and has reinvigorated my interest in this research topic.

Final Thoughts

This was another great ISWC conference, although many familiar faces were missing.

There was a great and vibrant workshop programme. My paper [3] was presented during the Enabling Open Semantic Science workshop (SemSci 2018) and resulted in a good deal of discussion. There were also great keynotes at the workshop from Paul Groth (slides) and Yolanda Gil which I would recommend anyone to look over.

I regret not having gone to more of the Industry Track sessions. The one I did make was very inspiring to see how the results of the community are being used in practice, and to get insights into the challenges faced.

The conference banquet involved a walking dinner around the Monterey Bay Aquarium. This was a great idea as it allowed plenty of opportunities for conversations with a wide range of conference participants; far more than your standard banquet.

Here are some other takes on the conference:

I also managed to sneak off to look for the sea otters.

[1] Unknown bibtex entry with key [BatchelorBCDDDEGGGGHKLOPSSTWWW14]
[Bibtex]
[2] Unknown bibtex entry with key [Gray14]
[Bibtex]
[3] Alasdair J. G. Gray. Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources. In Enabling Open Semantic Science, Monterey, California, USA, oct 2018. Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb
[Bibtex]
@InProceedings{Gray2018:jupyter:SemSci2018,
abstract = {In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.},
author = {Alasdair J G Gray},
title = {Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources},
OPTcrossref = {},
OPTkey = {},
booktitle = {Enabling Open Semantic Science},
year = {2018},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = oct,
address = {Monterey, California, USA},
OPTorganization = {},
OPTpublisher = {},
note = {Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb},
url = {http://ceur-ws.org/Vol-2184/paper-02/paper-02.html},
OPTannote = {}
}

SLiDInG 6

Today, the Semantic Web Lab hosted the 6th Scottish Linked Data Interest Group workshop at Heriot-Watt University. The event was sponsored by the SICSA Data Science Theme. The event was well attended with 30 researchers from across Scotland (and Newcastle) coming together for a day of flash talks and discussions. Live minutes were captured during the day and can be found here.

I gave a talk on the successes and challenges of FAIR data. My slides are embedded below.

DUCS not LOD

The follow is an excerpt from a blog by Keir Winesmith, Head of Digital at the San Francisco Museum of Modern Art (@SFMOMAlab)

Linked Open Data may sound good and noble, but it’s the wrong way around. It is a truth universally acknowledged, that an organization in possession of good Data, must want it Open (and indeed, Linked).

Well, I call bullshit. Most cultural heritage organizations (like most organizations) are terrible at data. And most of those who are good at collecting it, very rarely use it effectively or strategically.

Instead of Linked Open Data (LOD), Keir argues for DUCS:

I propose an alternative anagram, and an alternative order of importance.

  • D. Data. Step one, collect the data that is most likely to help you and your organization make better decisions in the future. For example collection breadth, depth, accuracy, completeness, diversity, and relationships between objects and creators.
  • U. Utilise. Actually use the data to inform your decisions, and test your hypotheses, within the bounds of your mission.
  • C. Context. Provide context for your data, both internally and externally. What’s inside? How is represented? How complete is it? How accurate? How current? How was it gathered?
  • S. Share. Now you’re ready to share it! Share it with context. Share it with the communities that are included in it first, follow the cultural heritage strategy of “nothing about me, without me”. Reach out to the relevant students, scholars, teachers, artists, designers, anthropologists, technologists, and whomever could use it. Get behind it and keep it up to date.

I’m against LOD, if it doesn’t follow DUCS first.

If you’re going to do it, do it right.

Source: Against Linked Open Data – Keir Winesmith – Medium

Interoperability and FAIRness through a novel combination of Web technologies

New paper [1] on using Semantic Web technologies to publish existing data according to the FAIR data principles [2].

Abstract: Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

[1] Unknown bibtex entry with key [Wilkinson2017-FAIRness]
[Bibtex]
[2] Unknown bibtex entry with key [Wilkinson2016]
[Bibtex]

Shapeshifting LOD Cloud

A new version of the Linked Open Data (LOD) cloud has been produced and it shows quite a shift from the previous version. It is great to see the LOD cloud continue to grow both in scale and diversity.

(You can click on the image to get to an interactive version of the cloud with links to the DataHub entries.)

LOD Cloud January 2017

LOD Cloud January 2017

Previously DBPedia and GeoNames were the centre of the LOD universe. While DBPedia still remains an important linking dataset, it is now clear that there are clusterings within application domains. This is most significant in the life sciences.

LOD Cloud August 2014

LOD Cloud August 2014

Attribution: “Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/”