ISWC2017 Papers

I have had two papers accepted within the events that make up ISWC2017.

My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as the capsizing of the Eastern Star on the Yangtze River [1].

The second paper is a poster in the main conference that provides an overview of the Bioschemas project where we are identifying the Schema.org markup that is of primary importance for life science resources. Hopefully the paper title will pull the punters in for the session [2].

[1] Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, and Chengxiang Wang. Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams. In Web Stream Processing 2017, Vienna, Austria, 2017.
[Bibtex]
@InProceedings{ZhouEtal2017:LostSilence:WSP2017,
abstract = {Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones' positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.},
author = {Qianru Zhou and Stephen McLaughlin and Alasdair J G Gray and Shangbin Wu and Chengxiang Wang},
title = {Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams},
OPTcrossref = {},
OPTkey = {},
booktitle = {Web Stream Processing 2017},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = oct,
address = {Vienna, Austria},
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
url = {http://ceur-ws.org/Vol-1936/paper-03.pdf},
OPTannote = {}
}
[2] Alasdair J. G. Gray, Carole Goble, Rafael C. Jimenez, and The Bioschemas Community. Bioschemas: From Potato Salad to Protein Annotation. In ISWC 2017 Poster Proceedings, Vienna, Austria, 2017. (Poster)
[Bibtex]
@InProceedings{grayetal2017:bioschemas:iswc2017,
abstract = {The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g. food recipes, the Bioschemas community (http://bioschemas.org) aim to make life sciences datasets more findable by encouraging data providers to embed Schema.org markup in their resources.},
author = {Alasdair J G Gray and Carole Goble and Rafael C Jimenez and {The Bioschemas Community}},
title = {Bioschemas: From Potato Salad to Protein Annotation},
OPTcrossref = {},
OPTkey = {},
booktitle = {ISWC 2017 Poster Proceedings},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = oct,
address = {Vienna, Austria},
OPTorganization = {},
OPTpublisher = {},
url = {http://ceur-ws.org/Vol-1963/paper579.pdf},
note = {(Poster)},
OPTannote = {}
}

SICSA Digital Humanities Event

On 24 August I attended the SICSA Digital Humanities event hosted at Strathclyde University. The event was organised by Martin Halvey and Frank Hopfgartner. The event brought together cultural heritage practitioners, and researchers from the humanities and computer science.

The day started off with a keynote from Lorna Hughes, Professor of Digital Humanities at the University of Glasgow. She highlighted that there is not a single definition for digital humanities (weblink presents a random definition from a set collected at another event). However, at the core, digital humanities consists of:

  • Digital content
  • Digital methods
  • Tools

The purpose of digitial humanities is to enable better and/or faster outputs as well as conceptualising new research questions.

Lorna showcased several projects that she has been involved with highlighting the issues that were faced before identifying a set of lessons learned and challenges going forward (see her blog and slideshare). She highlighted that only about 10% of content has been transformed into a digital form, and of that only 3% is openly available. Additionally, some artefacts have been digitised in multiple ways at different time points, and the differences in these digital forms tells a story about the object.

Lorna highlighted the following challenges:

  • Enabling better understanding of digital content
  • Developing underlying digital infrastructure
  • Supporting the use of open content
  • Enabling the community
  • Working with born-digital content.

The second part of the day saw us brainstorming ideas in groups. Two potential apps were outlined to support the public get more out of the cultural heritage environment around us.

An interesting panel discussion was had, focused around what you would do with a mythical £350m. It also involved locking up 3D scanners, at least until appropriate methodology and metadata was made available.

The day finished off with an interesting keynote from Daniela Petrelli, Sheffield Hallam University. This was an interesting talk focussing on the outputs of the EU meSch project. A holistic design approach on the visitor experience was proposed that encompassed interaction design, product design, and content design. See the below embedded video for an idea.

Summary

There are lots of opportunities for collaboration between digital humanities and computing. From my perspective, there are lots of interesting challenges around capturing data metadata, linking between datasets, and capturing provenance of workflows.

Throughout the day, various participants were tweeting with the #dhfest hashtag.

DUCS not LOD

The follow is an excerpt from a blog by Keir Winesmith, Head of Digital at the San Francisco Museum of Modern Art (@SFMOMAlab)

Linked Open Data may sound good and noble, but it’s the wrong way around. It is a truth universally acknowledged, that an organization in possession of good Data, must want it Open (and indeed, Linked).

Well, I call bullshit. Most cultural heritage organizations (like most organizations) are terrible at data. And most of those who are good at collecting it, very rarely use it effectively or strategically.

Instead of Linked Open Data (LOD), Keir argues for DUCS:

I propose an alternative anagram, and an alternative order of importance.

  • D. Data. Step one, collect the data that is most likely to help you and your organization make better decisions in the future. For example collection breadth, depth, accuracy, completeness, diversity, and relationships between objects and creators.
  • U. Utilise. Actually use the data to inform your decisions, and test your hypotheses, within the bounds of your mission.
  • C. Context. Provide context for your data, both internally and externally. What’s inside? How is represented? How complete is it? How accurate? How current? How was it gathered?
  • S. Share. Now you’re ready to share it! Share it with context. Share it with the communities that are included in it first, follow the cultural heritage strategy of “nothing about me, without me”. Reach out to the relevant students, scholars, teachers, artists, designers, anthropologists, technologists, and whomever could use it. Get behind it and keep it up to date.

I’m against LOD, if it doesn’t follow DUCS first.

If you’re going to do it, do it right.

Source: Against Linked Open Data – Keir Winesmith – Medium

An Identifier Scheme for the Digitising Scotland Project

The Digitising Scotland project is having the vital records of Scotland transcribed from images of the original handwritten civil registers . Linking the resulting dataset of 24 million vital records covering the lives of 18 million people is a major challenge requiring improved record linkage techniques. Discussions within the multidisciplinary, widely distributed Digitising Scotland project team have been hampered by the teams in each of the institutions using their own identification scheme. To enable fruitful discussions within the Digitising Scotland team, we required a mechanism for uniquely identifying each individual represented on the certificates. From the identifier it should be possible to determine the type of certificate and the role each person played. We have devised a protocol to generate for any individual on the certificate a unique identifier, without using a computer, by exploiting the National Records of Scotland’s registration districts. Importantly, the approach does not rely on the handwritten content of the certificates which reduces the risk of the content being misread resulting in an incorrect identifier. The resulting identifier scheme has improved the internal discussions within the project. This paper discusses the rationale behind the chosen identifier scheme, and presents the format of the different identifiers.

The work reported in the paper was supported by the British ESRC under grants ES/K00574X/1(Digitising Scotland) and ES/L007487/1 (Administrative Data Research Centre – Scotland).

My coauthors are:

  • Özgür Akgün, University of St Andrews
  • Ahamd Alsadeeqi, Heriot-Watt University
  • Peter Christen, Australian National University
  • Tom Dalton, University of St Andrews
  • Alan Dearle, University of St Andrews
  • Chris Dibben, University of Edinburgh
  • Eilidh Garret, University of Essex
  • Graham Kirby, University of St Andrews
  • Alice Reid, University of Cambridge
  • Lee Williamson, University of Edinburgh

The work reported in this talk is the result of the Digitising Scotland Raasay Retreat. Also at the retreat were:

  • Julia Jennings, University of Albany
  • Christine Jones
  • Diego Ramiro-Farinas, Centre for Human and Social Sciences (CCHS) of the Spanish National Research Council (CSIC)

Interoperability and FAIRness through a novel combination of Web technologies

New paper [1] on using Semantic Web technologies to publish existing data according to the FAIR data principles [2].

Abstract: Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

[1] [doi] Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo {Bonino da Silva Santos}, Tim Clark, Morris A. Swertz, Fleur D. L. Kelpin, Alasdair J. G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, and Michel Dumontier. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Computer Science, 3:e110, 2017.
[Bibtex]
@article{Wilkinson2017-FAIRness,
abstract = {Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.},
author = {Wilkinson, Mark D. and Verborgh, Ruben and {Bonino da Silva Santos}, Luiz Olavo and Clark, Tim and Swertz, Morris A. and Kelpin, Fleur D.L. and Gray, Alasdair J.G. and Schultes, Erik A. and van Mulligen, Erik M. and Ciccarese, Paolo and Kuzniar, Arnold and Gavai, Anand and Thompson, Mark and Kaliyaperumal, Rajaram and Bolleman, Jerven T. and Dumontier, Michel},
doi = {10.7717/peerj-cs.110},
issn = {2376-5992},
journal = {PeerJ Computer Science},
month = apr,
pages = {e110},
publisher = {PeerJ Inc.},
title = {{Interoperability and FAIRness through a novel combination of Web technologies}},
url = {https://peerj.com/articles/cs-110},
volume = {3},
year = {2017}
}
[2] [doi] Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino {da Silva Santos}, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A. C. {‘t Hoen}, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3:160018, 2016.
[Bibtex]
@article{Wilkinson2016,
abstract = {There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the {FAIR} Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the {FAIR} Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the {FAIR} Principles, and includes the rationale behind them, and some exemplar implementations in the community.},
author = {Wilkinson, Mark D and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and {da Silva Santos}, Luiz Bonino and Bourne, Philip E and Bouwman, Jildau and Brookes, Anthony J and Clark, Tim and Crosas, Merc{\`{e}} and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J.G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S and Heringa, Jaap and {'t Hoen}, Peter A.C and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J and Martone, Maryann E and Mons, Albert and Packer, Abel L and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend},
doi = {10.1038/sdata.2016.18},
issn = {2052-4463},
journal = {Scientific Data},
month = mar,
pages = {160018},
publisher = {Macmillan Publishers Limited},
title = {{The FAIR Guiding Principles for scientific data management and stewardship}},
url = {http://www.nature.com/articles/sdata201618},
volume = {3},
year = {2016}
}