Streaming Data

ISWC2017 Papers

I have had two papers accepted within the events that make up ISWC2017.

My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as the capsizing of the Eastern Star on the Yangtze River [1].

The second paper is a poster in the main conference that provides an overview of the Bioschemas project where we are identifying the markup that is of primary importance for life science resources. Hopefully the paper title will pull the punters in for the session [2].

[1] Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, and Chengxiang Wang. Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams. In Web Stream Processing 2017, Vienna, Austria, oct 2017.
abstract = {Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones' positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.},
author = {Qianru Zhou and Stephen McLaughlin and Alasdair J G Gray and Shangbin Wu and Chengxiang Wang},
title = {Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams},
OPTcrossref = {},
OPTkey = {},
booktitle = {Web Stream Processing 2017},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = oct,
address = {Vienna, Austria},
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
url = {},
OPTannote = {}
[2] Unknown bibtex entry with key [grayetal2017:bioschemas:iswc2017]

MACS Christmas Conference

I was asked to speak at the School (Faculty) of Mathematical and Computer Sciences (MACS) Christmas conference. I decided I would have some fun with the presentation.

Title: Project X

Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X.

You can find more details of flood defence work in this paper.

Data Integration in a Big Data Context

Today I had the pleasure of visiting the Urban Big Data Centre (UDBC) to give a seminar on Data Integration in a Big Data context (slides below). The idea for the seminar came about due to my collaboration with Nick Bailey (Associate Director of the UBDC) in the Administrative Research Data Centre for Scotland (ADRC-S).

In the seminar I wanted to highlight the challenges of data integration that arise in a Big Data context and show examples from my past work that would be relevant to those in the UBDC. In the presentation, I argue that RDF provides a good approach for data integration but it does not solve the basic challenges of messy data and generating mappings between datasets. It does however lay these challenges bare on the table, as Frank van Harmelen highlighted in his SWAT4LS keynote in 2013.

The first use case is drawn from my work on the EU SemSorGrid4Env project where we were developing an integrated view for emergency response planning. The particular use case shown is that of coastal flooding on the south coast of England. Although this project finished in 2011, I am still involved with developing RDF and SPARQL continuous data extensions; see the W3C RDF Stream Processing Community Group for details.

The second use case is drawn from my work on the EU Open PHACTS project. I showed the approach we developed for supporting user controlled views of the integrated data through Scientific Lenses. However, I also talked about the successes of the project and the fact that is currently being actively used for pharmacology research and receiving over 20million hits a month.

I finished the talk with an overview of the Administrative Data Research Centre for Scotland (ADRC-S) and my work on linking birth, marriage, and death records. I am hoping that we can adopt the lenses approach together with incorporating feedback on the linkages from the researchers who will use the integrated views.

In the discussions following the talk, the notion of FAIR data came up. This is the idea that data should be Findable, Accessible, Interoperable, and Reusable by both humans and machines. RDF is one approach that could lead to this. The other area of discussion was around community initiatives for converting existing open datasets into an RDF format. I advocated adopting the approach followed by the Bio2RDF community who share the tasks of creating and maintaining such scripts for biological datasets. An important part of this jigsaw is tracking the provenance of the datasets, for which the W3C Health Care and Life Sciences Community Profile for Dataset Descriptions could be beneficial (there is nothing specific to the HCLS community in the profile).

SICSA Databases for the Environmental and Social Sciences

Today I attended the SICSA Databases for the Environmental and Social Sciences event hosted by Andy Cobley from the University of Dundee. I gave the below talk on the challenges of linking data.

Many areas of scientific discovery rely on combining data from multiples data sources. However there are many challenges in linking data. This presentation highlights these challenges in the context of using Linked Data for environmental and social science databases.

SensorBench SICSA Presentation

SensorBench [1] is a benchmark suite for wireless sensor networks. The design of wireless sensor network systems sits within a multi-dimensional design space, where it can be difficult to understand the implications of specific decisions and to identify optimal solutions. SensorBench enables the systematic analysis and comparison of different techniques and platforms, enabling both development and user communities to make well informed choices. The benchmark identifies key variables and performance metrics, and specifies experiments that explore how different types of task perform under different metrics for the controlled variables. The benchmark is demonstrated by its application on representative platforms.

[1] [pdf] Unknown bibtex entry with key [SensorBenchSSDBM2014]