Publications

Publications list

You can also find my publications at the following links.



2023

  • Danielle Welter, Nick Juty, Philippe Rocca-Serra, Fuqi Xu, David Henderson, Wei Gu, Jolanda Strubel, Robert Giessmann, Ibrahim Emam, Yojana Gadiya, Tooba Abbassi-Daloii, Ebtisam Alharbi, Alasdair Gray, Melanie Courtot, Philip Gribbon, Vassilios Ioannidis, Dorothy Reilly, Nick Lynch, Jan-Willem Boiten, Venkata Satagopam, Carole Goble, Susanna-Assunta Sansone, and Tony Burdett. FAIR in action – a flexible framework to guide FAIRification. Scientific Data, 2023. To appear doi:10.5281/zenodo.7702124
    [BibTeX] [Abstract] [Download PDF]

    The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with a wide range of public-private partnership projects, demonstrating and implementing improvements across all aspects of FAIR, using a variety of datasets, to demonstrate the reproducibility and wide-ranging applicability of this framework for intra-project FAIRification.

    @article{Welter:FAIR-in-action:SDATA2023,
    abstract={The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with a wide range of public-private partnership projects, demonstrating and implementing improvements across all aspects of FAIR, using a variety of datasets, to demonstrate the reproducibility and wide-ranging applicability of this framework for intra-project FAIRification.},
    title={{FAIR} in action - a flexible framework to guide {FAIRification}},
    author={Danielle Welter and Nick Juty and Philippe Rocca-Serra and Fuqi Xu and David Henderson and Wei Gu and Jolanda Strubel and Robert Giessmann and Ibrahim Emam and Yojana Gadiya and Tooba Abbassi-Daloii and Ebtisam Alharbi and Alasdair Gray and Melanie Courtot and Philip Gribbon and Vassilios Ioannidis and Dorothy Reilly and Nick Lynch and Jan-Willem Boiten and Venkata Satagopam and Carole Goble and Susanna-Assunta Sansone and Tony Burdett},
    journal={Scientific Data},
    OPTmonth={},
    OPTvolume={},
    year={2023},
    doi={10.5281/zenodo.7702124},
    url={https://doi.org/10.5281/zenodo.7702124},
    publisher={Nature},
    note={To appear}
    }

2022

  • Isuru Liyanage, Tony Burdett, Bert Droesbeke, Karoly Erdos, Rolando Fernandez, Alasdair Gray, Muhammad Haseeb, Simon Jupp, Flavia Penim, Cyril Pommier, Philippe Rocca-Serra, Mélanie Courtot, and Frederik Coppens. ELIXIR biovalidator for semantic validation of life science metadata. Bioinformatics, 38(11):3141-3142, 2022. btac195 doi:10.1093/bioinformatics/btac195
    [BibTeX] [Abstract] [Download PDF]

    {To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents.Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate}

    @article{courtot:elixir-validator:bioinformatics2022,
    author = {Liyanage, Isuru and Burdett, Tony and Droesbeke, Bert and Erdos, Karoly and Fernandez, Rolando and Gray, Alasdair and Haseeb, Muhammad and Jupp, Simon and Penim, Flavia and Pommier, Cyril and Rocca-Serra, Philippe and Courtot, Mélanie and Coppens, Frederik},
    title = "{ELIXIR biovalidator for semantic validation of life science metadata}",
    journal = {Bioinformatics},
    year = {2022},
    month = apr,
    volume = {38},
    number = {11},
    pages = {3141-3142},
    publisher = {Oxford University Press},
    abstract = "{To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents.Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac195},
    url = {https://doi.org/10.1093/bioinformatics/btac195},
    note = {btac195},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac195/43287005/btac195.pdf},
    }

  • Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G. Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, and others. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database, 2022, 2022. doi:10.1093/database/baac035
    [BibTeX] [Abstract] [Download PDF]

    Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec

    @article{nico:SSSOM:Database2022,
    abstract={Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec.
    Database URL: http://w3id.org/sssom/spec},
    title={A Simple Standard for Sharing Ontological Mappings (SSSOM)},
    author={Matentzoglu, Nicolas and Balhoff, James P and Bello, Susan M and Bizon, Chris and Brush, Matthew and Callahan, Tiffany J and Chute, Christopher G and Duncan, William D and Evelo, Chris T and Gabriel, Davera and others},
    journal={Database},
    month=may,
    volume={2022},
    year={2022},
    doi={10.1093/database/baac035},
    url={https://doi.org/10.1093/database/baac035},
    publisher={Oxford Academic}
    }

  • Alasdair J. G. Gray, Petros Papadopoulos, Imran Asif, Ivan Micetic, and András Hatos. Creating and Exploiting the Intrinsically Disordered Protein Knowledge Graph (IDP-KG). In 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022, Virtual Event, Leiden, The Netherlands, January 10th to 14th, 2022, volume 3127 of {CEUR} Workshop Proceedings, page 1–10. CEUR-WS.org, 2022.
    [BibTeX] [Abstract] [Download PDF]

    There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model. In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (IDP-KG). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting IDP-KG has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.

    @inproceedings{GrayEtal:bioschemas-idpkg:swat4hcls2022,
    abstract = {There are many data sources containing overlapping information about Intrinsically Disordered Proteins (IDP). IDPcentral aims to be a registry to aid the discovery of data about proteins known to be intrinsically disordered by aggregating the content from these sources. Traditional ETL approaches for populating IDPcentral require the API and data model of each source to be wrapped and then transformed into a common model.
    In this paper, we investigate using Bioschemas markup as a mechanism to populate the IDPcentral registry by constructing the Intrinsically Disordered Protein Knowledge Graph (IDP-KG). Bioschemas markup is a machine-readable, lightweight representation of the content of each page in the site that is embedded in the HTML. For any site it is accessible through a HTTP request. We harvest the Bioschemas markup in three IDP sources and show the resulting IDP-KG has the same breadth of proteins available as the original sources, and can be used to gain deeper insight into their content by querying them as a single, consolidated knowledge graph.},
    author = {Alasdair J. G. Gray and
    Petros Papadopoulos and
    Imran Asif and
    Ivan Micetic and
    Andr{\'{a}}s Hatos},
    title = {Creating and Exploiting the Intrinsically Disordered Protein Knowledge Graph {(IDP-KG)}},
    booktitle = {13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, {SWAT4HCLS} 2022, Virtual Event, Leiden, The Netherlands, January 10th to 14th, 2022},
    series = {{CEUR} Workshop Proceedings},
    volume = {3127},
    pages = {1--10},
    publisher = {CEUR-WS.org},
    year = {2022},
    url = {http://ceur-ws.org/Vol-3127/paper-1.pdf}
    }

  • Ammar Ammar, Ivan Mičetić, and Alasdair J. G. Gray. An ETL pipeline to construct the Intrinsically Disordered Proteins Knowledge Graph (IDP-KG) using Bioschemas JSON-LD data dumps. Technical Report, 2022. doi:10.37044/osf.io/7f95d
    [BibTeX] [Abstract] [Download PDF]

    Schema.org and Bioschemas are lightweight vocabularies that aim at making the contents of web pages machine-readable so that software agents can consume that content and understand it in an actionable way. Due to the time needed to process each page, extracting markup by visiting each page of a site is not practical for huge sites. This approach imposes processing requirements on the publisher and the consumer. The Schema.org community proposed a method for exchanging markup from various pages as a DataFeed published at a recognized address in February 2022. This would ease publisher and customer processing requirements and accelerate data collection. In this work, we report on the implementation of a JSON-LD consumer ETL (Extract-Transform-Load) pipeline that enables data dumps to be ingested into knowledge graphs (KG). The pipeline loads scraped JSON-LD from the three sources, converts it to RDF, applies SPARQL construct queries to map the source RDF to a unified Bioschemas-based model and stores the resulting KG as a turtle file. This work was conducted during the one-week Biohackathion Europe 2022 in Paris France, under Project 23 titled, “Publishing and Consuming Schema.org DataFeeds.”

    @techReport{ammar:data-pipeline:biohackrxiv2022,
    abstract={Schema.org and Bioschemas are lightweight vocabularies that aim at making the contents of web pages machine-readable so that software agents can consume that content and understand it in an actionable way. Due to the time needed to process each page, extracting markup by visiting each page of a site is not practical for huge sites. This approach imposes processing requirements on the publisher and the consumer. The Schema.org community proposed a method for exchanging markup from various pages as a DataFeed published at a recognized address in February 2022. This would ease publisher and customer processing requirements and accelerate data collection. In this work, we report on the implementation of a JSON-LD consumer ETL (Extract-Transform-Load) pipeline that enables data dumps to be ingested into knowledge graphs (KG). The pipeline loads scraped JSON-LD from the three sources, converts it to RDF, applies SPARQL construct queries to map the source RDF to a unified Bioschemas-based model and stores the resulting KG as a turtle file. This work was conducted during the one-week Biohackathion Europe 2022 in Paris France, under Project 23 titled, “Publishing and Consuming Schema.org DataFeeds.”},
    title={An ETL pipeline to construct the Intrinsically Disordered Proteins Knowledge Graph (IDP-KG) using Bioschemas JSON-LD data dumps},
    url={https://biohackrxiv.org/7f95d/},
    DOI={10.37044/osf.io/7f95d},
    publisher={BioHackrXiv},
    author={Ammar Ammar and Ivan Mičetić and Alasdair J. G. Gray},
    year={2022},
    month=nov
    }

  • Alasdair J. G. Gray, Petros Papadopoulos, Alban Gaignard, Thomas Rosnet, and Ivan Mičetić. Bioschemas data harvesting project report. Technical Report, 2022. doi:10.37044/osf.io/y6gbq
    [BibTeX] [Abstract] [Download PDF]

    The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration.

    @techReport{gray:bioschemas-harvesting:2022,
    abstract={The promise of Bioschemas is that it makes consuming data from multiple resources more straightforward. However, this hypothesis has not been tested by conducting a large scale harvest of deployed markup and making this available for others to reuse. Therefore, the goal of this hackathon project is to harvest a collection of Bioschemas markup from a number of different sites listed on the Bioschemas live deploys page using the Bioschemas Markup Scraper and Extractor (BMUSE). The harvested data will be made available for others and loaded into a triplestore to allow for further exploration.},
    title={Bioschemas data harvesting project report},
    url={https://biohackrxiv.org/y6gbq/},
    DOI={10.37044/osf.io/y6gbq},
    publisher={BioHackrXiv},
    author={Gray, Alasdair J G and Papadopoulos, Petros and Gaignard, Alban and Rosnet, Thomas and Mičetić, Ivan},
    year={2022},
    month=mar
    }

2021

  • Hua Zhao, Fairouz Kamareddine, Alasdair J. G. Gray, and Hind Zantout. A Novel Method That Identifies The Hidden Properties Of A Person’s Name In Kanji Or Hanzi. In The 8th Multidisciplinary International Social Networks Conference, MISNC2021, page 79–85, New York, NY, USA, 2021. Association for Computing Machinery. doi:10.1145/3504006.3504022
    [BibTeX] [Abstract]

    Most Japanese persons’ Kanji names have similar characters with ’Hanzi’. For example, ’吉田悠人’ and ’欧阳沛南’ are two persons’ names. ’吉田悠人’ is a Kanji name, ’欧阳沛南’ is a Hanzi name. It is a challenge for a computer to identify the name origins between ’Hanzi’ and ’Kanji’ of a person’s name. In addition, we found that there are limited existing methods to identify more than one hidden property of people’s names at one time. In this paper, we aim to build a method that can identify the hidden properties of Hanzi names and Kanji names. Our novel method can point out the origin, the surname and the given name of a person’s name. The evaluation result shows that our method is better than an existing system ’TextBlob’ on predicting the origins of people’s names. The accuracy of our proposed method is 86.87%.

    @inproceedings{Hua:properties-name:MISNC2021,
    author = {Zhao, Hua and Kamareddine, Fairouz and Gray, Alasdair J G and Zantout, Hind},
    title = {A Novel Method That Identifies The Hidden Properties Of A Person’s Name In Kanji Or Hanzi},
    year = {2021},
    isbn = {9781450396011},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    doi = {10.1145/3504006.3504022},
    abstract = { Most Japanese persons’ Kanji names have similar characters with ’Hanzi’. For example, ’吉田悠人’ and ’欧阳沛南’ are two persons’ names. ’吉田悠人’ is a Kanji name, ’欧阳沛南’ is a Hanzi name. It is a challenge for a computer to identify the name origins between ’Hanzi’ and ’Kanji’ of a person’s name. In addition, we found that there are limited existing methods to identify more than one hidden property of people’s names at one time. In this paper, we aim to build a method that can identify the hidden properties of Hanzi names and Kanji names. Our novel method can point out the origin, the surname and the given name of a person’s name. The evaluation result shows that our method is better than an existing system ’TextBlob’ on predicting the origins of people’s names. The accuracy of our proposed method is 86.87%.},
    booktitle = {The 8th Multidisciplinary International Social Networks Conference},
    pages = {79–85},
    numpages = {7},
    keywords = {Text Diversity, Name Identification, Machine Learning, Hidden property},
    location = {Bergen, Norway},
    series = {MISNC2021}
    }

  • Imran Asif, Ilaria Tiddi, and Alasdair J. G. Gray. Using Nanopublications to Detect and Explain Contradictory Research Claims. In 2021 IEEE 17th International Conference on eScience (eScience), pages 1-10, 2021. doi:10.1109/eScience51609.2021.00010
    [BibTeX] [Abstract]

    We tackle the problem of automatically detecting conflicting claims in research outputs. This has become even more urgent in recent years, with the increasing volume of scientific publications available. Researchers are struggling to keep pace with the literature, and to efficiently make comparisons between the results of different published studies. We hypothesise that the difficult and time-consuming process of searching and comparing results across research publications can be facilitated using machine-readable, standardised knowledge representation methods. To this end, we propose to exploit Nanopublications as the standard framework to represent the claims in research studies, and use provenance data expressed by the model as an indicator of the source of the contradiction between different claims. We evaluate this idea over the Cooperation Databank (CoDa); a repository of social science studies. Our results show that the use of provenance information can be a good factor to identify the cause of conflicting claims, and that our method can support scientists in comparing literature in a more automated way.

    @inproceedings{asif:nanopub-contradictions:eScience2021,
    abstract={We tackle the problem of automatically detecting conflicting claims in research outputs. This has become even more urgent in recent years, with the increasing volume of scientific publications available. Researchers are struggling to keep pace with the literature, and to efficiently make comparisons between the results of different published studies. We hypothesise that the difficult and time-consuming process of searching and comparing results across research publications can be facilitated using machine-readable, standardised knowledge representation methods. To this end, we propose to exploit Nanopublications as the standard framework to represent the claims in research studies, and use provenance data expressed by the model as an indicator of the source of the contradiction between different claims. We evaluate this idea over the Cooperation Databank (CoDa); a repository of social science studies. Our results show that the use of provenance information can be a good factor to identify the cause of conflicting claims, and that our method can support scientists in comparing literature in a more automated way.},
    author={Asif, Imran and Tiddi, Ilaria and Gray, Alasdair J. G.},
    booktitle={2021 IEEE 17th International Conference on eScience (eScience)},
    title={Using Nanopublications to Detect and Explain Contradictory Research Claims},
    year={2021},
    pages={1-10},
    doi={10.1109/eScience51609.2021.00010}
    }

  • Seyed Amir Hosseini Beghaeiraveri, Alasdair J. G. Gray, and Fiona McNeill. Reference Statistics in Wikidata Topical Subsets. In Proceedings of the 2nd Wikidata Workshop (Wikidata 2021), co-located with the 20th International Semantic Web Conference (ISWC 2021), 2021.
    [BibTeX] [Abstract] [Download PDF]

    Wikidata is the only general-purpose open knowledge graph with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating references. In this paper, we investigate the statistics of Wikidata references in 6 topical subsets of Wikidata. We compare these statistics over two Wikidata dumps; one from 2016 and one from 2021.

    @InProceedings{HosseiniBeghaeiraveri2021:RefStats:wikidata2021,
    title = "Reference Statistics in Wikidata Topical Subsets",
    abstract = "Wikidata is the only general-purpose open knowledge graph with the capability of specifying references for every single statement. Currently, about 68% of Wikidata statements have at least one reference but the quality of these references is rarely covered in data quality studies. There is also a lack of a comprehensive framework for evaluating references. In this paper, we investigate the statistics of Wikidata references in 6 topical subsets of Wikidata. We compare these statistics over two Wikidata dumps; one from 2016 and one from 2021.",
    booktitle = "Proceedings of the 2nd Wikidata Workshop (Wikidata 2021), co-located with the 20th International Semantic Web Conference (ISWC 2021)",
    keywords = "Reference quality, Wikidata, Data quality, Topical subset, WikiProject, Gene Wiki",
    author = "Hosseini Beghaeiraveri, Seyed Amir and Gray, Alasdair J. G. and Fiona McNeill",
    year = "2021",
    month = oct,
    url = "http://ceur-ws.org/Vol-2982/paper-3.pdf",
    }

  • Ruben Kruiper, Ioannis Konstas, Alasdair J. G. Gray, Fahrad Sadeghineko, Richard Watson, and Bimal Kumar. SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts. In Natural Legal Language Processing workshop, EMNLP, pages 129-143, Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics.
    [BibTeX] [Abstract] [Download PDF]

    Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).

    @InProceedings{Kruiper2021:SPaRtxt:nllp2021,
    abstract = {Automated Compliance Checking (ACC) systems aim to semantically parse building regulations to a set of rules. However, semantic parsing is known to be hard and requires large amounts of training data. The complexity of creating such training data has led to research that focuses on small sub-tasks, such as shallow parsing or the extraction of a limited subset of rules. This study introduces a shallow parsing task for which training data is relatively cheap to create, with the aim of learning a lexicon for ACC. We annotate a small domain-specific dataset of 200 sentences, SPaR.txt, and train a sequence tagger that achieves 79,93 F1-score on the test set. We then show through manual evaluation that the model identifies most (89,84%) defined terms in a set of building regulation documents, and that both contiguous and discontiguous Multi-Word Expressions (MWE) are discovered with reasonable accuracy (70,3%).},
    author = {Ruben Kruiper and Ioannis Konstas and Alasdair J G Gray and Fahrad Sadeghineko and Richard Watson and Bimal Kumar},
    title = {SPaR.txt, a cheap Shallow Parsing approach for Regulatory texts},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Natural Legal Language Processing workshop, EMNLP},
    year = {2021},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    pages = {129-143},
    OPTmonth = nov,
    address = {Punta Cana, Dominican Republic},
    OPTorganization = {},
    publisher = {Association for Computational Linguistics},
    OPTnote = {},
    url = {https://aclanthology.org/2021.nllp-1.14},
    OPTannote = {}
    }

  • {Seyed Amir} {Hosseini Beghaeiraveri}, {Alasdair J. G. }. Gray, and Fiona McNeill. Experiences of Using WDumper to Create Topical Subsets from Wikidata. In Proceedings of the 2nd International Workshop on Knowledge Graph Construction, co-located with the 18th Extended Semantic Web Conference (ESWC 2021), volume 2873 of CEUR Workshop Proceedings. CEUR Workshop Proceedings (CEUR-WS.org), 2021.
    [BibTeX] [Abstract] [Download PDF]

    Wikidata is a general-purpose knowledge graph covering a wide variety of topics with content being crowd-sourced through an open wiki. There are now over 90M interrelated data items in Wikidata which are accessible through a public query endpoint and data dumps. However, execution timeout limits and the size of data dumps make it difficult to use the data. The creation of arbitrary topical subsets of Wikidata, where only the relevant data is kept, would enable reuse of that data with the benefits of cost reduction, ease of access, and flexibility. In this paper, we provide a formal definition of topical subsets over the Wikidata Knowledge Graph and evaluate a third-party tool (WDumper) to extract these topical subsets from Wikidata.

    @InProceedings{HosseiniBeghaeiraveri2021:TopicalSubsets:KGConstruction2021,
    title = "Experiences of Using WDumper to Create Topical Subsets from Wikidata",
    abstract = "Wikidata is a general-purpose knowledge graph covering a wide variety of topics with content being crowd-sourced through an open wiki. There are now over 90M interrelated data items in Wikidata which are accessible through a public query endpoint and data dumps. However, execution timeout limits and the size of data dumps make it difficult to use the data.
    The creation of arbitrary topical subsets of Wikidata, where only the relevant data is kept, would enable reuse of that data with the benefits of cost reduction, ease of access, and flexibility. In this paper, we provide a formal definition of topical subsets over the Wikidata Knowledge Graph and evaluate a third-party tool (WDumper) to extract these topical subsets from Wikidata.",
    booktitle = "Proceedings of the 2nd International Workshop on Knowledge Graph Construction, co-located with the 18th Extended Semantic Web Conference (ESWC 2021)",
    keywords = "Wikidata, Topical subset, WikiProject, Gene Wiki",
    author = "{Hosseini Beghaeiraveri}, {Seyed Amir} and Gray, {Alasdair J. G.} and Fiona McNeill",
    year = "2021",
    month = jun,
    series = "CEUR Workshop Proceedings",
    publisher = "CEUR Workshop Proceedings (CEUR-WS.org)",
    volume = "2873",
    url = "http://ceur-ws.org/Vol-2873/paper13.pdf",
    }

  • Alasdair J. G. Gray, Petros Papadopoulos, Ivan Mičetić, and András Hatos. Exploiting Bioschemas Markup to Populate IDPcentral. Technical Report, 2021. doi:10.37044/osf.io/v3jct
    [BibTeX] [Abstract] [Download PDF]

    One of the goals of the ELIXIR Intrinsically Disordered Protein (IDP) community is create a registry called IDPcentral. The registry will aggregate data contained in the community’s specialist data sources such as DisProt, MobiDB, and Protein Ensemble Database (PED) so that proteins that are known to be intrinsically disordered can be discovered; with summary details of the protein presented, and the specialist source consulted for more detailed data. At the ELIXIR BioHackathon-Europe 2020, we aimed to investigate the feasibility of populating IDPcentral harvesting the Bioschemas markup that has been deployed on the IDP community data sources. The benefit of using Bioschemas markup, which is embedded in the HTML web pages for each protein in the data source, is that a standard harvesting approach can be used for all data sources; rather than needing bespoke wrappers for each data source API. We expect to harvest the markup using the Bioschemas Markup Scraper and Extractor (BMUSE) tool that has been developed specifically for this purpose. The challenge, however, is that the sources contain overlapping information about proteins but use different identifiers for the proteins. After the data has been harvested, it will need to be processed so that information about a particular protein, which will come from multiple sources, is consolidated into a single concept for the protein, with links back to where each piece of data originated. As well as populating the IDPcentral registry, we plan to consolidate the markup into a knowledge graph that can be queried to gain further insight into the IDPs.

    @techReport{gray:bioschemas-idpcntral:2021,
    abstract={One of the goals of the ELIXIR Intrinsically Disordered Protein (IDP) community is create a registry called IDPcentral. The registry will aggregate data contained in the community's specialist data sources such as DisProt, MobiDB, and Protein Ensemble Database (PED) so that proteins that are known to be intrinsically disordered can be discovered; with summary details of the protein presented, and the specialist source consulted for more detailed data.
    At the ELIXIR BioHackathon-Europe 2020, we aimed to investigate the feasibility of populating IDPcentral harvesting the Bioschemas markup that has been deployed on the IDP community data sources. The benefit of using Bioschemas markup, which is embedded in the HTML web pages for each protein in the data source, is that a standard harvesting approach can be used for all data sources; rather than needing bespoke wrappers for each data source API. We expect to harvest the markup using the Bioschemas Markup Scraper and Extractor (BMUSE) tool that has been developed specifically for this purpose.
    The challenge, however, is that the sources contain overlapping information about proteins but use different identifiers for the proteins. After the data has been harvested, it will need to be processed so that information about a particular protein, which will come from multiple sources, is consolidated into a single concept for the protein, with links back to where each piece of data originated.
    As well as populating the IDPcentral registry, we plan to consolidate the markup into a knowledge graph that can be queried to gain further insight into the IDPs.},
    title={Exploiting Bioschemas Markup to Populate IDPcentral},
    url={https://biohackrxiv.org/v3jct},
    DOI={10.37044/osf.io/v3jct},
    publisher={BioHackrXiv},
    author={Gray, Alasdair J G and Papadopoulos, Petros and Mičetić, Ivan and Hatos, András},
    year={2021},
    month=Jun
    }

  • Jose E. Labra-Gayo, Alejandro G. Hevia, Daniel F. Álvarez, Ammar Ammar, Dan Brickley, Alasdair J. G. Gray, Eric Prud’hommeaux, Denise Slenter, Harold Solbrig, Seyed A. H. Beghaeiraveri, and et al.. Knowledge graphs and wikidata subsetting. Technical Report, 2021. doi:10.37044/osf.io/wu9et
    [BibTeX] [Abstract] [Download PDF]

    Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases. Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Science. Wikidata keeps getting bigger and better, which subsumes integration use cases. Having a large amount of data such as the one presented in a scopeless Wikidata offers some advantages, e.g., unique access point and common format, but also poses some challenges, e.g., performance. Regular wikidata users are not unfamiliar with running into frequent timeouts of submitted queries. Due to its popularity, limits have been imposed to allow for fair access to many. However this suppreses many interesting and complex queries that require more computational power and resources. Replicating Wikidata on one’s own infrastructure can be a solution which also offers a snapshot of the contents of wikidata at some given point in time. There is no need to replicate Wikidata in full, it is possible to work with subsets targeting, for instance, a particular domain. Creating those subsets has emerged as an alternative to reduce the amount and spectrum of data offered by Wikidata. Less data makes more complex queries possible while still keeping the compatibility with the whole Wikidata as the model is kept. In this paper we report the tasks done as part of a Wikidata subsetting project during the Virtual BioHackathon Europe 2020 and SWAT4(HC)LS 2021, which had already started at NBDC/DBCLS BioHackathon 2019 in Japan, SWAT4(HC)LS hackathon 2019, and Virtual COVID-19 BioHackathon 2019. We describe some of approaches we identified to create subsets and some susbsets from the Life Sciences domain as well as other use cases we also discussed.

    @techReport{labra-gayo:kg-subsetting:biohackrxiv2021,
    abstract={Knowledge graphs have successfully been adopted by academia, governement and industry to represent large scale knowledge bases.
    Open and collaborative knowledge graphs such as Wikidata capture knowledge from different domains and harmonize them under a common format, making it easier for researchers to access the data while also supporting Open Science.
    Wikidata keeps getting bigger and better, which subsumes integration use cases. Having a large amount of data such as the one presented in a scopeless Wikidata offers some advantages, e.g., unique access point and common format, but also poses some challenges, e.g., performance.
    Regular wikidata users are not unfamiliar with running into frequent timeouts of submitted queries. Due to its popularity, limits have been imposed to allow for fair access to many.
    However this suppreses many interesting and complex queries that require more computational power and resources. Replicating Wikidata on one's own infrastructure can be a solution which also offers a snapshot of the contents of wikidata at some given point in time.
    There is no need to replicate Wikidata in full, it is possible to work with subsets targeting, for instance, a particular domain. Creating those subsets has emerged as an alternative to reduce the amount and spectrum of data offered by Wikidata. Less data makes more complex queries possible while still keeping the compatibility with the whole Wikidata as the model is kept.
    In this paper we report the tasks done as part of a Wikidata subsetting project during the Virtual BioHackathon Europe 2020 and SWAT4(HC)LS 2021, which had already started at NBDC/DBCLS BioHackathon 2019 in Japan, SWAT4(HC)LS hackathon 2019, and Virtual COVID-19 BioHackathon 2019. We describe some of approaches we identified to create subsets and some susbsets from the Life Sciences domain as well as other use cases we also discussed.},
    title={Knowledge graphs and wikidata subsetting},
    url={https://biohackrxiv.org/wu9et},
    DOI={10.37044/osf.io/wu9et},
    publisher={BioHackrXiv},
    author={Labra-Gayo, Jose E and Hevia, Alejandro G and Álvarez, Daniel F and Ammar, Ammar and Brickley, Dan and Gray, Alasdair J G and Prud'hommeaux, Eric and Slenter, Denise and Solbrig, Harold and Beghaeiraveri, Seyed A H and et al.},
    year={2021},
    month=apr
    }

  • Peter Sefton, {Eoghan Ó}. Carragáin, Stian Soiland-Reyes, Oscar Corcho, Daniel Garijo, Raul Palma, Frederik Coppens, Carole Goble, {José María} Fernández, Kyle Chard, {Jose Manuel} Gomez-Perez, {Michael R}. Crusoe, Ignacio Eguinoa, Nick Juty, Kristi Holmes, {Jason A. }. Clark, Salvador Capella-Gutierrez, {Alasdair J. G. }. Gray, Stuart Owen, {Alan R. }. Williams, Giacomo Tartari, Finn Bacall, Thomas Thelen, Hervé Ménager, Laura Rodríguez-Navas, Paul Walk, brandon whitehead, Mark Wilkinson, Paul Groth, Erich Bremer, {LJ Garcia} Castro, Karl Sebby, Alexander Kanitz, Ana Trisovic, Gavin Kennedy, Mark Graves, Jasper Koehorst, Simone Leo, and Marc Portier. RO-Crate Metadata Specification 1.1.1. Technical Report, United Kingdom, 2021. Recommendation published by researchobject.org – see https://w3id.org/ro/crate/1.1 for web version. doi:10.5281/zenodo.4541002
    [BibTeX] [Abstract]

    This document specifies a method, known as RO-Crate (Research Object Crate), of aggregating and describing research data with associated metadata. RO-Crates can aggregate and describe any resource including files, URI-addressable resources, or use other addressing schemes to locate digital or physical data. RO-Crates can describe data in aggregate and at the individual resource level, with metadata to aid in discovery, re-use and long term management of data. Metadata includes the ability to describe the context of data and the entities involved in its production, use and reuse. For example: who created it, using which equipment, software and workflows, under what licenses can it be re-used, where was it collected, and/or where is it about.RO-Crate uses JSON-LD to to express this metadata using linked data, describing data resources as well as contextual entities such as people, organizations, software and equipment as a series of linked JSON-LD objects – using common published vocabularies, chiefly schema.org.The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment.While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples.

    @techreport{RO-Crate-1-1,
    title = "RO-Crate Metadata Specification 1.1.1",
    abstract = "This document specifies a method, known as RO-Crate (Research Object Crate), of aggregating and describing research data with associated metadata. RO-Crates can aggregate and describe any resource including files, URI-addressable resources, or use other addressing schemes to locate digital or physical data. RO-Crates can describe data in aggregate and at the individual resource level, with metadata to aid in discovery, re-use and long term management of data. Metadata includes the ability to describe the context of data and the entities involved in its production, use and reuse. For example: who created it, using which equipment, software and workflows, under what licenses can it be re-used, where was it collected, and/or where is it about.RO-Crate uses JSON-LD to to express this metadata using linked data, describing data resources as well as contextual entities such as people, organizations, software and equipment as a series of linked JSON-LD objects - using common published vocabularies, chiefly schema.org.The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment.While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples.",
    author = "Peter Sefton and Carrag{\'a}in, {Eoghan {\'O}} and Stian Soiland-Reyes and Oscar Corcho and Daniel Garijo and Raul Palma and Frederik Coppens and Carole Goble and Fern{\'a}ndez, {Jos{\'e} Mar{\'i}a} and Kyle Chard and Gomez-Perez, {Jose Manuel} and Crusoe, {Michael R} and Ignacio Eguinoa and Nick Juty and Kristi Holmes and Clark, {Jason A.} and Salvador Capella-Gutierrez and Gray, {Alasdair J. G.} and Stuart Owen and Williams, {Alan R.} and Giacomo Tartari and Finn Bacall and Thomas Thelen and Herv{\'e} M{\'e}nager and Laura Rodr{\'i}guez-Navas and Paul Walk and brandon whitehead and Mark Wilkinson and Paul Groth and Erich Bremer and Castro, {LJ Garcia} and Karl Sebby and Alexander Kanitz and Ana Trisovic and Gavin Kennedy and Mark Graves and Jasper Koehorst and Simone Leo and Marc Portier",
    note = "Recommendation published by researchobject.org - see https://w3id.org/ro/crate/1.1 for web version.",
    year = "2021",
    month = feb,
    doi = "10.5281/zenodo.4541002",
    publisher = "researchobject.org",
    address = "United Kingdom",
    }

2020

  • Abiodun G. Akinyemi, Ming Sun, and Alasdair J. G. Gray. Data integration for offshore decommissioning waste management. Automation in Construction, 109:103010, 2020. doi:10.1016/j.autcon.2019.103010
    [BibTeX] [Abstract] [Download PDF]

    Offshore decommissioning represents significant business opportunities for oil and gas service companies. However, for owners of offshore assets and regulators, it is a liability because of the associated costs. One way of mitigating decommissioning costs is through the sales and reuse of decommissioned items. To achieve this effectively, reliability assessment of decommissioned items is required. Such an assessment relies on data collected on the various items over the lifecycle of an engineering asset. Considering that offshore platforms have a design life of about 25 years and data management techniques and tools are constantly evolving, data captured about items to be decommissioned will be in varying forms. In addition, considering the many stakeholders involved with a facility over its lifecycle, information representation of the items will have variations. These challenges make data integration difficult. As a result, this research developed a data integration framework that makes use of Semantic Web technologies and ISO 15926 – a standard for process plant data integration – for rapid assessment of decommissioned items. The proposed solution helps in determining the reuse potential of decommissioned items, which can save on cost and benefit the environment.

    @article{Akinyemi:offshore-data-integration:2020,
    abstract = {Offshore decommissioning represents significant business opportunities for oil and gas service companies. However, for owners of offshore assets and regulators, it is a liability because of the associated costs. One way of mitigating decommissioning costs is through the sales and reuse of decommissioned items. To achieve this effectively, reliability assessment of decommissioned items is required. Such an assessment relies on data collected on the various items over the lifecycle of an engineering asset. Considering that offshore platforms have a design life of about 25 years and data management techniques and tools are constantly evolving, data captured about items to be decommissioned will be in varying forms. In addition, considering the many stakeholders involved with a facility over its lifecycle, information representation of the items will have variations. These challenges make data integration difficult. As a result, this research developed a data integration framework that makes use of Semantic Web technologies and ISO 15926 - a standard for process plant data integration - for rapid assessment of decommissioned items. The proposed solution helps in determining the reuse potential of decommissioned items, which can save on cost and benefit the environment.},
    doi = {10.1016/j.autcon.2019.103010},
    url = {https://www.sciencedirect.com/science/article/pii/S0926580518304059},
    year = 2020,
    month = nov,
    publisher = {Elsevier},
    volume = {109},
    pages = {103010},
    author = {Abiodun G. Akinyemi and Ming Sun and Alasdair J. G. Gray},
    title = {Data integration for offshore decommissioning waste management},
    journal = {Automation in Construction}
    }

2019

  • Imran Asif, Jessica Chen-Burger, and Alasdair J. G. Gray. Data Quality Issues in Current Nanopublications. In 2019 15th International Conference on eScience (eScience), pages 522-527, 2019. doi:10.1109/eScience.2019.00069
    [BibTeX] [Abstract]

    Nanopublications are a granular way of publishing scientific claims together with their associated provenance and publication information. More than 10 million nanopublications have been published by a handful of researchers covering a wide range of topics within the life sciences. We were motivated to replicate an existing analysis of these nanopublications, but then went deeper into the structure of the existing nanopublications. In this paper, we analyse the usage of nanopublications by investigating the distribution of triples in each part and discuss the data quality issues that were subsequently revealed. We argue that there is a need for the community to develop a set of guidelines for the modelling of nanopublications.

    @inproceedings{Asif:NanopubQuality:eScience2019,
    author={Asif, Imran and Chen-Burger, Jessica and Gray, Alasdair J. G.},
    booktitle={2019 15th International Conference on eScience (eScience)},
    title={Data Quality Issues in Current Nanopublications},
    year={2019},
    volume={},
    number={},
    pages={522-527},
    abstract={Nanopublications are a granular way of publishing scientific claims together with their associated provenance and publication information. More than 10 million nanopublications have been published by a handful of researchers covering a wide range of topics within the life sciences. We were motivated to replicate an existing analysis of these nanopublications, but then went deeper into the structure of the existing nanopublications. In this paper, we analyse the usage of nanopublications by investigating the distribution of triples in each part and discuss the data quality issues that were subsequently revealed. We argue that there is a need for the community to develop a set of guidelines for the modelling of nanopublications.}, keywords={}, doi={10.1109/eScience.2019.00069},
    month=sep
    }

  • Qianru Zhou, Alasdair J. G. Gray, and Stephen McLaughlin. ToCo: An Ontology for Representing Hybrid Telecommunication Networks. In The Semantic Web – 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2-6, 2019, Proceedings, volume 11503 of Lecture Notes in Computer Science, page 507–522. Springer, 2019. doi:10.1007/978-3-030-21348-0_33
    [BibTeX] [Abstract] [Download PDF]

    The TOUCAN project proposed an ontology for telecommunication networks with hybrid technologies – the TOUCAN Ontology (ToCo), available at http://purl.org/toco/, as well as a knowledge design pattern Device-Interface-Link (DIL) pattern. The core classes and relationships forming the ontology are discussed in detail. The ToCo ontology can describe the physical infrastructure, quality of channel, services and users in heterogeneous telecommunication networks which span multiple technology domains. The DIL pattern is observed and summarised when modelling networks with various technology domains. Examples and use cases of ToCo are presented for demonstration.

    @inproceedings{DBLP:conf/esws/ZhouGM19,
    abstract = {The TOUCAN project proposed an ontology for telecommunication networks with hybrid technologies – the TOUCAN Ontology (ToCo), available at http://purl.org/toco/, as well as a knowledge design pattern Device-Interface-Link (DIL) pattern. The core classes and relationships forming the ontology are discussed in detail. The ToCo ontology can describe the physical infrastructure, quality of channel, services and users in heterogeneous telecommunication networks which span multiple technology domains. The DIL pattern is observed and summarised when modelling networks with various technology domains. Examples and use cases of ToCo are presented for demonstration.},
    author = {Qianru Zhou and
    Alasdair J. G. Gray and
    Stephen McLaughlin},
    title = {ToCo: An Ontology for Representing Hybrid Telecommunication Networks},
    booktitle = {The Semantic Web - 16th International Conference, {ESWC} 2019, Portoro{\v{z}},
    Slovenia, June 2-6, 2019, Proceedings},
    series = {Lecture Notes in Computer Science},
    volume = {11503},
    pages = {507--522},
    publisher = {Springer},
    month = jun,
    year = {2019},
    url = {https://doi.org/10.1007/978-3-030-21348-0\_33},
    doi = {10.1007/978-3-030-21348-0\_33}
    }

  • Qianru Zhou, Alasdair J. G. Gray, Dimitrios Pezaros, and Stephen McLaughlin. SARA – A Semantic Access Point Resource Allocation Service for Heterogenous Wireless Networks. In 2019 Wireless Days, WD 2019, Manchester, United Kingdom, April 24-26, 2019, page 1–8. IEEE, 2019. doi:10.1109/WD.2019.8734260
    [BibTeX] [Abstract] [Download PDF]

    In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system – SEANET, SARA selects the access point providing the best quality of service among the different access technologies. Based on an ontology assisted knowledge based system SEANET, SARA can also adapt the access point selection strategy according to customer defined rules automatically. Results of our evaluation based on emulated networks with hybrid access technologies and various scales show that SARA is able to improve the channel condition, in terms of throughput, evidently. Comparisons with current AP selection algorithms demonstrate that SARA outperforms the existing AP selection algorithms. The overhead in terms of time expense is reasonable and is shown to be faster than traditional access point selection approaches.

    @inproceedings{DBLP:conf/wd/ZhouGPM19,
    abstract = {In this paper, we present SARA, a Semantic Access point Resource Allocation service for heterogenous wireless networks with various wireless access technologies existing together. By automatically reasoning on the knowledge base of the full system provided by a knowledge based autonomic network management system - SEANET, SARA selects the access point providing the best quality of service among the different access technologies. Based on an ontology assisted knowledge based system SEANET, SARA can also adapt the access point selection strategy according to customer defined rules automatically. Results of our evaluation based on emulated networks with hybrid access technologies and various scales show that SARA is able to improve the channel condition, in terms of throughput, evidently. Comparisons with current AP selection algorithms demonstrate that SARA outperforms the existing AP selection algorithms. The overhead in terms of time expense is reasonable and is shown to be faster than traditional access point selection approaches.},
    author = {Qianru Zhou and
    Alasdair J. G. Gray and
    Dimitrios Pezaros and
    Stephen McLaughlin},
    title = {{SARA} - {A} Semantic Access Point Resource Allocation Service for
    Heterogenous Wireless Networks},
    booktitle = {2019 Wireless Days, {WD} 2019, Manchester, United Kingdom, April 24-26,
    2019},
    pages = {1--8},
    publisher = {{IEEE}},
    month = apr,
    year = {2019},
    url = {https://doi.org/10.1109/WD.2019.8734260},
    doi = {10.1109/WD.2019.8734260}
    }

  • Fiona McNeill, Diana Bental, Alasdair J. G. Gray, Sabina Jedrzejczyk, and Ahmad Alsadeeqi. Generating corrupted data sources for the evaluation of matching systems. In The Fourteenth International Workshop on Ontology Matching, volume 2536 of CEUR Workshop Proceedings, page 41–45. CEUR Workshop Proceedings (CEUR-WS.org), oct 2019.
    [BibTeX] [Abstract] [Download PDF]

    One of the most difficult aspects of developing matching systems – whether for matching ontologies or for other types of mismatched data – is evaluation. The accuracy of matchers are usually evaluated by measuring the results produced by the systems against reference sets, but gold-standard reference sets are expensive and difficult to create. In this paper we introduce crptr, which generates multiple variations of different sorts of dataset, where the degree of variation is controlled, in order that they can be used to evaluate matchers in different context.

    @inproceedings{McNeill:crptr-matching-evaluation:OM2019,
    title = "Generating corrupted data sources for the evaluation of matching systems",
    abstract = "One of the most difficult aspects of developing matching systems – whether for matching ontologies or for other types of mismatched data – is evaluation. The accuracy of matchers are usually evaluated by measuring the results produced by the systems against reference sets, but gold-standard reference sets are expensive and difficult to create. In this paper we introduce crptr, which generates multiple variations of different sorts of dataset, where the degree of variation is controlled, in order that they can be used to evaluate matchers in different context.",
    keywords = "Matching, Evaluation, Data Corruption",
    author = "Fiona McNeill and Diana Bental and Alasdair J. G. Gray and Sabina Jedrzejczyk and Ahmad Alsadeeqi",
    year = "2019",
    month = oct,
    series = "CEUR Workshop Proceedings",
    publisher = "CEUR Workshop Proceedings (CEUR-WS.org)",
    volume = "2536",
    pages = "41--45",
    booktitle = "The Fourteenth International Workshop on Ontology Matching",
    url = "http://ceur-ws.org/Vol-2536/om2019_STpaper2.pdf"
    }

  • Pascal Hitzler, Miriam Fernández, Krzysztof Janowicz, Amrapali Zaveri, Alasdair J. G. Gray, Vanessa López, Armin Haller, and Karl Hammar, editors. The Semantic Web – 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2-6, 2019, Proceedings, volume 11503 of Lecture Notes in Computer ScienceSpringer, 2019. doi:10.1007/978-3-030-21348-0
    [BibTeX] [Abstract] [Download PDF]

    This book constitutes the refereed proceedings of the 16th International Semantic Web Conference, ESWC 2019, held in Portorož, Slovenia. The 39 revised full papers presented were carefully reviewed and selected from 134 submissions. The papers are organized in three tracks: research track, resources track, and in-use track and deal with the following topical areas: distribution and decentralisation, velocity on the Web, research of research, ontologies and reasoning, linked data, natural language processing and information retrieval, semantic data management and data infrastructures, social and human aspects of the Semantic Web, and, machine learning.

    @proceedings{eswc2019,
    abstract = {This book constitutes the refereed proceedings of the 16th International Semantic Web Conference, ESWC 2019, held in Portorož, Slovenia.
    The 39 revised full papers presented were carefully reviewed and selected from 134 submissions. The papers are organized in three tracks: research track, resources track, and in-use track and deal with the following topical areas: distribution and decentralisation, velocity on the Web, research of research, ontologies and reasoning, linked data, natural language processing and information retrieval, semantic data management and data infrastructures, social and human aspects of the Semantic Web, and, machine learning.},
    editor = {Pascal Hitzler and
    Miriam Fern{\'{a}}ndez and
    Krzysztof Janowicz and
    Amrapali Zaveri and
    Alasdair J. G. Gray and
    Vanessa L{\'{o}}pez and
    Armin Haller and
    Karl Hammar},
    title = {The Semantic Web - 16th International Conference, {ESWC} 2019, Portoro{\v{z}},
    Slovenia, June 2-6, 2019, Proceedings},
    series = {Lecture Notes in Computer Science},
    volume = {11503},
    publisher = {Springer},
    month = jun,
    year = {2019},
    url = {https://doi.org/10.1007/978-3-030-21348-0},
    doi = {10.1007/978-3-030-21348-0},
    isbn = {978-3-030-21347-3},
    }

2018

  • Simon D. Harding, Joanna L. Sharman, Elena Faccenda, Christopher Southan, Adam J. Pawson, Sam Ireland, Alasdair J. G. Gray, Liam Bruce, Stephen P. H. Alexander, Stephen Anderton, Clare Bryant, Anthony P. Davenport, Christian Doerig, Doriano Fabbro, Francesca Levi -, Michael Spedding, Jamie A. Davies, and Nc -. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Research, 46(Database-Issue):D1091–D1106, 2018. doi:10.1093/nar/gkx1121
    [BibTeX] [Abstract] [Download PDF]

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.

    @article{Harding:GtoPdb:NAR2018,
    abstract = {The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.},
    author = {Simon D. Harding and
    Joanna L. Sharman and
    Elena Faccenda and
    Christopher Southan and
    Adam J. Pawson and
    Sam Ireland and
    Alasdair J. G. Gray and
    Liam Bruce and
    Stephen P. H. Alexander and
    Stephen Anderton and
    Clare Bryant and
    Anthony P. Davenport and
    Christian Doerig and
    Doriano Fabbro and
    Francesca Levi{-}Schaffer and
    Michael Spedding and
    Jamie A. Davies and
    Nc{-}Iuphar},
    title = {The {IUPHAR/BPS} Guide to {PHARMACOLOGY} in 2018: updates and expansion
    to encompass the new guide to {IMMUNOPHARMACOLOGY}},
    journal = {Nucleic Acids Research},
    volume = {46},
    number = {Database-Issue},
    pages = {D1091--D1106},
    year = {2018},
    url = {https://doi.org/10.1093/nar/gkx1121},
    doi = {10.1093/nar/gkx1121}
    }

  • Abiodun Akinyemi, Ming Sun, and Alasdair J. G. Gray. An ontology-based data integration framework for construction information management. Proceedings of the Institution of Civil Engineers – Management, Procurement and Law, 171(3):111–125, 2018. doi:10.1680/jmapl.17.00052
    [BibTeX] [Abstract] [Download PDF]

    Information management during the construction phase of a built asset involves multiple stakeholders using multiple software applications to generate and store data. This is problematic as data come in different forms and are labour intensive to piece together. Existing solutions to this problem are predominantly in proprietary applications, which are sometimes cost prohibitive for small engineering firms, or conceptual studies with use cases that cannot be easily adapted. In view of these limitations, this research presents an ontology-based data integration framework that makes use of open-source tools that support Semantic Web technologies. The proposed framework enables rapid answering of queries over construction data integrated from heterogeneous sources, data quality checks and reuse of project software resources. The attributes and functionalities of the proposed solution align with the requirements common to small firms with limited information technology skill and budget. Consequently, this solution can be of great benefit for their data projects.

    @article{Akinyemi_2018,
    abstract = {Information management during the construction phase of a built asset involves multiple stakeholders using multiple software applications to generate and store data. This is problematic as data come in different forms and are labour intensive to piece together. Existing solutions to this problem are predominantly in proprietary applications, which are sometimes cost prohibitive for small engineering firms, or conceptual studies with use cases that cannot be easily adapted. In view of these limitations, this research presents an ontology-based data integration framework that makes use of open-source tools that support Semantic Web technologies. The proposed framework enables rapid answering of queries over construction data integrated from heterogeneous sources, data quality checks and reuse of project software resources. The attributes and functionalities of the proposed solution align with the requirements common to small firms with limited information technology skill and budget. Consequently, this solution can be of great benefit for their data projects.},
    doi = {10.1680/jmapl.17.00052},
    url = {https://doi.org/10.1680%2Fjmapl.17.00052},
    year = 2018,
    month = jun,
    publisher = {Thomas Telford Ltd.},
    volume = {171},
    number = {3},
    pages = {111--125},
    author = {Abiodun Akinyemi and Ming Sun and Alasdair J G Gray},
    title = {An ontology-based data integration framework for construction information management},
    journal = {Proceedings of the Institution of Civil Engineers - Management, Procurement and Law}
    }

  • Charalampos Rotsos, Arsham Farshad, Daniel King, David Hutchison, Qianru Zhou, Alasdair J. G. Gray, Cheng -, and Stephen McLaughlin. ReasoNet: Inferring Network Policies Using Ontologies. In 4th IEEE Conference on Network Softwarization and Workshops, NetSoft 2018, Montreal, QC, Canada, June 25-29, 2018, page 159–167. IEEE, jun 2018. doi:10.1109/NETSOFT.2018.8460050
    [BibTeX] [Abstract] [Download PDF]

    Modern Software Defined Networking (SDN) control stacks consist of multiple abstraction and virtualization layers to enable flexibility in the development of new control features. Rich data modeling frameworks are essential when sharing information across control layers. Unfortunately, existing Network Operating System (NOS) data modeling capabilities are limited to simple type-checking and code templating. We present an exploration of a more extreme point on SDN data modeling: ReasoNet. Developers can use semantic web technologies to enrich their data models with reasoning rules and integrity/consistency constraints, and automate state inference across layers. We demonstrate the ability of ReasoNet to automate state verification and cross-layer debugging, through the implementation of two popular control applications, a learning switch and a Quality of Service (QoS) policy engine.

    @inproceedings{DBLP:conf/netsoft/RotsosFKHZGWM18,
    abstract = {Modern Software Defined Networking (SDN) control stacks consist of multiple abstraction and virtualization layers to enable flexibility in the development of new control features. Rich data modeling frameworks are essential when sharing information across control layers. Unfortunately, existing Network Operating System (NOS) data modeling capabilities are limited to simple type-checking and code templating. We present an exploration of a more extreme point on SDN data modeling: ReasoNet. Developers can use semantic web technologies to enrich their data models with reasoning rules and integrity/consistency constraints, and automate state inference across layers. We demonstrate the ability of ReasoNet to automate state verification and cross-layer debugging, through the implementation of two popular control applications, a learning switch and a Quality of Service (QoS) policy engine.},
    author = {Charalampos Rotsos and
    Arsham Farshad and
    Daniel King and
    David Hutchison and
    Qianru Zhou and
    Alasdair J. G. Gray and
    Cheng{-}Xiang Wang and
    Stephen McLaughlin},
    title = {ReasoNet: Inferring Network Policies Using Ontologies},
    booktitle = {4th {IEEE} Conference on Network Softwarization and Workshops, NetSoft
    2018, Montreal, QC, Canada, June 25-29, 2018},
    pages = {159--167},
    publisher = {{IEEE}},
    month = jun,
    year = {2018},
    url = {https://doi.org/10.1109/NETSOFT.2018.8460050},
    doi = {10.1109/NETSOFT.2018.8460050}
    }

  • Alasdair J. G. Gray. Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources. In Enabling Open Semantic Science, Monterey, California, USA, oct 2018. Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb
    [BibTeX] [Abstract] [Download PDF]

    In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.

    @InProceedings{Gray2018:jupyter:SemSci2018,
    abstract = {In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.},
    author = {Alasdair J G Gray},
    title = {Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Enabling Open Semantic Science},
    year = {2018},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = oct,
    address = {Monterey, California, USA},
    OPTorganization = {},
    OPTpublisher = {},
    note = {Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb},
    url = {http://ceur-ws.org/Vol-2184/paper-02/paper-02.html},
    OPTannote = {}
    }

2017

  • Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino Silva da Santos, Tim Clark, Morris A. Swertz, Fleur D. L. Kelpin, Alasdair J. G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, and Michel Dumontier. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Computer Science, 3:e110, 2017. doi:10.7717/peerj-cs.110
    [BibTeX] [Abstract] [Download PDF]

    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

    @article{Wilkinson:InteropFAIRWebTech:PeerJ2017,
    author = {Mark D. Wilkinson and
    Ruben Verborgh and
    Luiz Olavo Bonino da Silva Santos and
    Tim Clark and
    Morris A. Swertz and
    Fleur D. L. Kelpin and
    Alasdair J. G. Gray and
    Erik A. Schultes and
    Erik M. van Mulligen and
    Paolo Ciccarese and
    Arnold Kuzniar and
    Anand Gavai and
    Mark Thompson and
    Rajaram Kaliyaperumal and
    Jerven T. Bolleman and
    Michel Dumontier},
    title = {Interoperability and FAIRness through a novel combination of Web technologies},
    journal = {PeerJ Computer Science},
    volume = {3},
    pages = {e110},
    year = {2017},
    url = {https://doi.org/10.7717/peerj-cs.110},
    doi = {10.7717/peerj-cs.110},
    abstract = {Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.}
    }

  • Alasdair J. G. Gray, Carole A. Goble, and Rafael Jimenez. Bioschemas: From Potato Salad to Protein Annotation. In Proceedings of the ISWC 2017 Posters & Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference (ISWC 2017), Vienna, Austria, October 23rd – to – 25th, 2017., volume 1963 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2017. (Poster paper)
    [BibTeX] [Abstract] [Download PDF]

    The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g. food recipes, the Bioschemas community (http://bioschemas.org) aim to make life sciences datasets more findable by encouraging data providers to embed Schema.org markup in their resources.

    @inproceedings{Gray:BioschemasPotatoSalad:ISWC2017,
    author = {Alasdair J. G. Gray and Carole A. Goble and Rafael Jimenez},
    booktitle = {Proceedings of the {ISWC} 2017 Posters {\&} Demonstrations and Industry Tracks co-located with 16th International Semantic Web Conference {(ISWC} 2017), Vienna, Austria, October 23rd - to - 25th, 2017.},
    month = oct,
    note = {(Poster paper)},
    publisher = {CEUR-WS.org},
    series = {{CEUR} Workshop Proceedings},
    title = {Bioschemas: From Potato Salad to Protein Annotation},
    url = {http://ceur-ws.org/Vol-1963/paper579.pdf},
    volume = {1963},
    year = {2017},
    abstract = {The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g. food recipes, the Bioschemas community (http://bioschemas.org) aim to make life sciences datasets more findable by encouraging data providers to embed Schema.org markup in their resources.}
    }

  • Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, and Chengxiang Wang. Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams. In Web Stream Processing 2017, Vienna, Austria, oct 2017.
    [BibTeX] [Abstract] [Download PDF]

    Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones’ positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.

    @InProceedings{ZhouEtal2017:LostSilence:WSP2017,
    abstract = {Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones' positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.},
    author = {Qianru Zhou and Stephen McLaughlin and Alasdair J G Gray and Shangbin Wu and Chengxiang Wang},
    title = {Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Web Stream Processing 2017},
    year = {2017},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = oct,
    address = {Vienna, Austria},
    OPTorganization = {},
    OPTpublisher = {},
    OPTnote = {},
    url = {http://ceur-ws.org/Vol-1936/paper-03.pdf},
    OPTannote = {}
    }

  • Ahmad Alsadeeqi and Alasdair J. G. Gray. Systematically corrupting data to assess data linkage quality. In The Systematic Linking of Historical Records, University of Guelph, Guelph, Canada, 2017. Workshop website \url{http://recordlink.org/}
    [BibTeX] [Abstract]

    Computer algorithms use string matching techniques to assess how likely two historical records are to be the same. The quality of linkage is unclear without knowing the correct links or ground truth. Synthetically generated datasets for which ground truth is known are helpful but the data typically are too clean to be representative of historical records. We assess data linkage algorithms under different data quality scenarios, e.g. with errors typical of historical transcriptions. A data corrupting model injects four types of mistakes: character level (e.g. an f is represented as an s – OCR Corruptions), attribute level (e.g. male changed to female due to false entry), record level (e.g. missing records), and group of records level (e.g. coffee spilt over a page, lost parish records in fire). We then evaluate record linkage algorithms over synthetically generated datasets with known ground truth and data corruptions matching a given profile.

    @InProceedings{alsadeeqiGray:systematicCorruption:2017,
    abstract = {Computer algorithms use string matching techniques to assess how likely two historical records are to be the same. The quality of linkage is unclear without knowing the correct links or ground truth. Synthetically generated datasets for which ground truth is known are helpful but the data typically are too clean to be representative of historical records. We assess data linkage algorithms under different data quality scenarios, e.g. with errors typical of historical transcriptions. A data corrupting model injects four types of mistakes: character level (e.g. an f is represented as an s – OCR Corruptions), attribute level (e.g. male changed to female due to false entry), record level (e.g. missing records), and group of records level (e.g. coffee spilt over a page, lost parish records in fire). We then evaluate record linkage algorithms over synthetically generated datasets with known ground truth and data corruptions matching a given profile.},
    author = {Ahmad Alsadeeqi and Alasdair J G Gray},
    title = {Systematically corrupting data to assess data linkage quality},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {The Systematic Linking of Historical Records},
    year = {2017},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = may,
    address = {University of Guelph, Guelph, Canada},
    OPTorganization = {},
    OPTpublisher = {},
    note = {Workshop website \url{http://recordlink.org/}},
    OPTannote = {}
    }

2016

  • Michel Dumontier, Alasdair JG Gray, Scott M. Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, and others. The health care and life sciences community profile for dataset descriptions. PeerJ, 4:e2331, 2016. doi:10.7717/peerj.2331
    [BibTeX] [Abstract] [Download PDF]

    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

    @article{Dumontier:HCLS-datadesc:PeerJ2016,
    abstract={Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.},
    title={The health care and life sciences community profile for dataset descriptions},
    author={Dumontier, Michel and Gray, Alasdair JG and Marshall, M Scott and Alexiev, Vladimir and Ansell, Peter and Bader, Gary and Baran, Joachim and Bolleman, Jerven T and Callahan, Alison and Cruz-Toledo, Jos{\'e} and others},
    journal={PeerJ},
    volume={4},
    pages={e2331},
    year={2016},
    month=aug,
    url={https://doi.org/10.7717/peerj.2331},
    doi={10.7717/peerj.2331},
    publisher={PeerJ Inc.}
    }

  • Christopher J. Playford, Vernon Gayle, Roxanne Connelly, and Alasdair JG Gray. Administrative social science data: The challenge of reproducible research. Big Data and Society, 3(2):2053951716684143, 2016. doi:10.1177/2053951716684143
    [BibTeX] [Abstract] [Download PDF]

    Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.

    @article{Playford:AdminSocSciRep:BDS2016,
    author = {Christopher J Playford and Vernon Gayle and Roxanne Connelly and Alasdair JG Gray},
    title ={Administrative social science data: The challenge of reproducible research},
    journal = {Big Data and Society},
    volume = {3},
    number = {2},
    pages = {2053951716684143},
    year = {2016},
    doi = {10.1177/2053951716684143},
    URL = {https://doi.org/10.1177/2053951716684143},
    eprint = {https://doi.org/10.1177/2053951716684143},
    abstract = {Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.}
    }

  • Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, and others. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1):1–9, 2016. doi:10.1038/sdata.2016.18
    [BibTeX] [Abstract]

    There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

    @article{Wilkinson:FAIRPrinciples:SciData2016,
    title={The FAIR Guiding Principles for scientific data management and stewardship},
    author={Wilkinson, Mark D and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E and others},
    journal={Scientific data},
    volume={3},
    number={1},
    pages={1--9},
    year={2016},
    publisher={Nature Publishing Group},
    doi={10.1038/sdata.2016.18},
    abstract={There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.}
    }

  • Alasdair J. G. Gray, Michel Dumontier, and Scott M. Marshall. Describing Datasets with the Health Care and Life Sciences Community Profile. In Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016), Amsterdam, The Netherlands, 2016. (Tutorial)
    [BibTeX]
    @inproceedings{Gray:HCLS:SWAT4LSTutorial2016,
    address = {Amsterdam, The Netherlands},
    author = {Alasdair J. G. Gray and Michel Dumontier and M. Scott Marshall},
    booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016)},
    month = dec,
    note = {(Tutorial)},
    title = {Describing Datasets with the Health Care and Life Sciences Community Profile},
    year = {2016}
    }

  • Alasdair J. G. Gray. Validata: A tool for testing profile conformance. In Smart Descriptions & Smarter Vocabularies (SDSVoc), Amsterdam, The Netherlands, nov 2016.
    [BibTeX] [Abstract] [Download PDF]

    Validata is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub.

    @InProceedings{Gray2016SDSVocValidata,
    abstract = {Validata is an online web application for validating a dataset description expressed in RDF against a community profile expressed as a Shape Expression (ShEx). Additionally it provides an API for programmatic access to the validator. Validata is capable of being used for multiple community agreed standards, e.g. DCAT, the HCLS community profile, or the Open PHACTS guidelines, and there are currently deployments to support each of these. Validata can be easily repurposed for different deployments by providing it with a new ShEx schema. The Validata code is available from GitHub.},
    author = {Alasdair J. G. Gray},
    title = {Validata: A tool for testing profile conformance},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Smart Descriptions \& Smarter Vocabularies (SDSVoc)},
    year = {2016},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = nov,
    address = {Amsterdam, The Netherlands},
    OPTorganization = {},
    OPTpublisher = {},
    OPTnote = {},
    url = {https://www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_2},
    OPTannote = {}
    }

  • Michel Dumontier, Alasdair J. G. Gray, and Scott M. Marshall. The HCLS Community Profile: Describing Datasets, Versions, and Distributions. In Smart Descriptions & Smarter Vocabularies (SDSVoc), Amsterdam, The Netherlands, nov 2016.
    [BibTeX] [Abstract] [Download PDF]

    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches.

    @InProceedings{Gray2016SDSVocHCLS,
    abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
    The goal of this presentation is to give an overview of the HCLS Community Profile and explain how it extends and builds upon other approaches.},
    author = {Michel Dumontier and Alasdair J. G. Gray and M. Scott Marshall},
    title = {The {HCLS} Community Profile: Describing Datasets, Versions, and Distributions},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Smart Descriptions \& Smarter Vocabularies (SDSVoc)},
    year = {2016},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = nov,
    address = {Amsterdam, The Netherlands},
    OPTorganization = {},
    OPTpublisher = {},
    OPTnote = {},
    url = {https://www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_3},
    OPTannote = {}
    }

  • Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil, editors. The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part II, volume 9982 of Lecture Notes in Computer Science, 2016. doi:10.1007/978-3-319-46547-0
    [BibTeX] [Download PDF]
    @proceedings{iswc2016-2,
    editor = {Paul T. Groth and
    Elena Simperl and
    Alasdair J. G. Gray and
    Marta Sabou and
    Markus Kr{\"{o}}tzsch and
    Freddy L{\'{e}}cu{\'{e}} and
    Fabian Fl{\"{o}}ck and
    Yolanda Gil},
    title = {The Semantic Web - {ISWC} 2016 - 15th International Semantic Web Conference,
    Kobe, Japan, October 17-21, 2016, Proceedings, Part {II}},
    series = {Lecture Notes in Computer Science},
    volume = {9982},
    month = oct,
    year = {2016},
    url = {http://dx.doi.org/10.1007/978-3-319-46547-0},
    doi = {10.1007/978-3-319-46547-0},
    isbn = {978-3-319-46546-3},
    }

  • Paul T. Groth, Elena Simperl, Alasdair J. G. Gray, Marta Sabou, Markus Krötzsch, Freddy Lécué, Fabian Flöck, and Yolanda Gil, editors. The Semantic Web – ISWC 2016 – 15th International Semantic Web Conference, Kobe, Japan, October 17-21, 2016, Proceedings, Part I, volume 9981 of Lecture Notes in Computer Science, oct 2016. doi:10.1007/978-3-319-46523-4
    [BibTeX] [Download PDF]
    @proceedings{iswc2016-1,
    editor = {Paul T. Groth and
    Elena Simperl and
    Alasdair J. G. Gray and
    Marta Sabou and
    Markus Kr{\"{o}}tzsch and
    Freddy L{\'{e}}cu{\'{e}} and
    Fabian Fl{\"{o}}ck and
    Yolanda Gil},
    title = {The Semantic Web - {ISWC} 2016 - 15th International Semantic Web Conference,
    Kobe, Japan, October 17-21, 2016, Proceedings, Part {I}},
    series = {Lecture Notes in Computer Science},
    volume = {9981},
    month = oct,
    year = {2016},
    url = {http://dx.doi.org/10.1007/978-3-319-46523-4},
    doi = {10.1007/978-3-319-46523-4},
    isbn = {978-3-319-46522-7},
    }

2015

  • A. J. G. Gray, Joachim Baran, Scott M. Marshall, and Michel {Dumontier (Eds)}. Dataset Descriptions: HCLS Community Profile. {W3C Interest Group Note}, 2015.
    [BibTeX] [Download PDF]
    @techreport{Gray2015HCLS,
    author = {A. J. G. Gray and Baran, Joachim and Marshall, M Scott and {Dumontier (Eds)}, Michel},
    month = may,
    publisher = {World Wide Web Consortium},
    type = {{W3C Interest Group Note}},
    title = {{Dataset Descriptions: {HCLS} Community Profile}},
    year = {2015},
    url = {https://www.w3.org/TR/hcls-dataset/}
    }

2014

  • Alasdair J. G. Gray, Paul T. Groth, Antonis Loizou, Sune Askjaer, Christian Y. A. Brenninkmeijer, Kees Burger, Christine Chichester, Chris T. A. Evelo, Carole A. Goble, Lee Harland, Steve Pettifer, Mark Thompson, Andra Waagmeester, and Antony J. Williams. Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web Journal, 5(2):101–113, 2014. doi:10.3233/SW-2012-0088
    [BibTeX] [Abstract] [Download PDF]

    The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.

    @article{Gray:OPS:SWJ,
    abstract = {The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.},
    author = {Alasdair J. G. Gray and
    Paul T. Groth and
    Antonis Loizou and
    Sune Askjaer and
    Christian Y. A. Brenninkmeijer and
    Kees Burger and
    Christine Chichester and
    Chris T. A. Evelo and
    Carole A. Goble and
    Lee Harland and
    Steve Pettifer and
    Mark Thompson and
    Andra Waagmeester and
    Antony J. Williams},
    title = {Applying linked data approaches to pharmacology: Architectural decisions
    and implementation},
    journal = {Semantic Web Journal},
    volume = {5},
    number = {2},
    pages = {101--113},
    year = {2014},
    url = {https://doi.org/10.3233/SW-2012-0088},
    doi = {10.3233/SW-2012-0088}
    }

  • Alasdair J. G. Gray. Dataset Descriptions for Linked Data Systems. IEEE Internet Computing, 18(4):66–69, 2014. doi:10.1109/MIC.2014.66
    [BibTeX] [Download PDF]
    @article{DBLP:journals/internet/Gray14,
    author = {Alasdair J. G. Gray},
    title = {Dataset Descriptions for Linked Data Systems},
    journal = {{IEEE} Internet Computing},
    volume = {18},
    number = {4},
    pages = {66--69},
    year = {2014},
    url = {https://doi.org/10.1109/MIC.2014.66},
    doi = {10.1109/MIC.2014.66}
    }

  • Paul T. Groth, Antonis Loizou, Alasdair J. G. Gray, Carole A. Goble, Lee Harland, and Steve Pettifer. API-centric Linked Data integration: The Open PHACTS Discovery Platform case study. Journal of Web Semantics, 29:12–18, 2014. doi:10.1016/j.websem.2014.03.003
    [BibTeX] [Download PDF]
    @article{DBLP:journals/ws/GrothLGGHP14,
    author = {Paul T. Groth and
    Antonis Loizou and
    Alasdair J. G. Gray and
    Carole A. Goble and
    Lee Harland and
    Steve Pettifer},
    title = {API-centric Linked Data integration: The Open {PHACTS} Discovery Platform
    case study},
    journal = {Journal of Web Semantics},
    volume = {29},
    pages = {12--18},
    year = {2014},
    url = {https://doi.org/10.1016/j.websem.2014.03.003},
    doi = {10.1016/j.websem.2014.03.003}
    }

  • Carole A. Goble, Alasdair J. G. Gray, and Eleftherios Tatakis. Help me describe my data: A demonstration of the Open PHACTS VoID Editor. In Matthew Horridge, Marco Rospocher, and Jacco van Ossenbruggen, editors, Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014., volume 1272 of {CEUR} Workshop Proceedings, page 29–32. CEUR-WS.org, oct 2014. (Demo paper)
    [BibTeX] [Download PDF]
    @inproceedings{Goble:VoIDEditor:ISWC2014,
    author = {Carole A. Goble and Alasdair J. G. Gray and Eleftherios Tatakis},
    booktitle = {Proceedings of the {ISWC} 2014 Posters {\&} Demonstrations Track a track within the 13th International Semantic Web Conference, {ISWC} 2014, Riva del Garda, Italy, October 21, 2014.},
    editor = {Matthew Horridge and Marco Rospocher and Jacco van Ossenbruggen},
    month = oct,
    note = {(Demo paper)},
    pages = {29--32},
    publisher = {CEUR-WS.org},
    series = {{CEUR} Workshop Proceedings},
    title = {Help me describe my data: {A} demonstration of the Open {PHACTS} VoID Editor},
    url = {http://ceur-ws.org/Vol-1272/paper\_33.pdf},
    volume = {1272},
    year = {2014}
    }

  • Colin Batchelor, Christian Brenninkmeijer, Chris Evelo, Carole Goble, Alasdair Gray, Ken Karapetyan, Valery Tkachenko, and Egon Willighagen. Scientific Lenses over Linked Chemistry Data using BridgeDb and the Open PHACTS Chemistry Registration System. In 10th ICCS, International Conference on Chemical Structures, Noordwijkerhout, the Netherlands, 2014. (Extended abstract)
    [BibTeX]
    @inproceedings{Batchelor:chem-lenses:ChemStruct2014,
    address = {Noordwijkerhout, the Netherlands},
    author = {Colin Batchelor and Christian Brenninkmeijer and Chris Evelo and Carole Goble and Alasdair Gray and Ken Karapetyan and Valery Tkachenko and Egon Willighagen},
    booktitle = {10th ICCS, International Conference on Chemical Structures},
    month = jun,
    note = {(Extended abstract)},
    title = {Scientific Lenses over Linked Chemistry Data using BridgeDb and the Open PHACTS Chemistry Registration System},
    year = {2014}
    }

  • Sajjad Hussain, Hong Sun, Ali Anil Sinaci, Gokce Banu Laleci Erturkmen, Charles N. Mead, Alasdair J. G. Gray, Deborah L. McGuinness, Eric Prud’hommeaux, Christel Daniel, and Kerstin Forsberg. A Framework for Evaluating and Utilizing Medical Terminology Mappings. In e-Health – For Continuity of Care – Proceedings of MIE2014, the 25th European Medical Informatics Conference, Istanbul, Turkey, August 31 – September 3, 2014, volume 205 of Studies in Health Technology and Informatics, page 594–598. IOS Press, sep 2014. doi:10.3233/978-1-61499-432-9-594
    [BibTeX] [Download PDF]
    @inproceedings{Hussain:MedicalMappings:MIE2014,
    author = {Sajjad Hussain and
    Hong Sun and
    Ali Anil Sinaci and
    Gokce Banu Laleci Erturkmen and
    Charles N. Mead and
    Alasdair J. G. Gray and
    Deborah L. McGuinness and
    Eric Prud'hommeaux and
    Christel Daniel and
    Kerstin Forsberg},
    title = {A Framework for Evaluating and Utilizing Medical Terminology Mappings},
    booktitle = {e-Health - For Continuity of Care - Proceedings of MIE2014, the 25th
    European Medical Informatics Conference, Istanbul, Turkey, August
    31 - September 3, 2014},
    series = {Studies in Health Technology and Informatics},
    volume = {205},
    pages = {594--598},
    publisher = {{IOS} Press},
    month = sep,
    year = {2014},
    url = {https://doi.org/10.3233/978-1-61499-432-9-594},
    doi = {10.3233/978-1-61499-432-9-594}
    }

  • Colin R. Batchelor, Christian Y. A. Brenninkmeijer, Christine Chichester, Mark Davies, Daniela Digles, Ian Dunlop, Chris T. A. Evelo, Anna Gaulton, Carole A. Goble, Alasdair J. G. Gray, Paul T. Groth, Lee Harland, Karen Karapetyan, Antonis Loizou, John P. Overington, Steve Pettifer, Jon Steele, Robert Stevens, Valery Tkachenko, Andra Waagmeester, Antony J. Williams, and Egon L. Willighagen. Scientific Lenses to Support Multiple Views over Linked Chemistry Data. In The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, volume 8796 of Lecture Notes in Computer Science, page 98–113. Springer, 2014. (Alphabetical authorship) doi:10.1007/978-3-319-11964-9_7
    [BibTeX] [Download PDF]
    @inproceedings{Batchelor:SciLenses:ISWC2014,
    author = {Colin R. Batchelor and
    Christian Y. A. Brenninkmeijer and
    Christine Chichester and
    Mark Davies and
    Daniela Digles and
    Ian Dunlop and
    Chris T. A. Evelo and
    Anna Gaulton and
    Carole A. Goble and
    Alasdair J. G. Gray and
    Paul T. Groth and
    Lee Harland and
    Karen Karapetyan and
    Antonis Loizou and
    John P. Overington and
    Steve Pettifer and
    Jon Steele and
    Robert Stevens and
    Valery Tkachenko and
    Andra Waagmeester and
    Antony J. Williams and
    Egon L. Willighagen},
    title = {Scientific Lenses to Support Multiple Views over Linked Chemistry
    Data},
    booktitle = {The Semantic Web - {ISWC} 2014 - 13th International Semantic Web Conference,
    Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part {I}},
    series = {Lecture Notes in Computer Science},
    volume = {8796},
    pages = {98--113},
    publisher = {Springer},
    month = oct,
    year = {2014},
    note = {(Alphabetical authorship)},
    url = {https://doi.org/10.1007/978-3-319-11964-9\_7},
    doi = {10.1007/978-3-319-11964-9\_7}
    }

  • Ixent Galpin, Alan B. Stokes, George Valkanas, Alasdair J. G. Gray, Norman W. Paton, Alvaro A. A. Fernandes, Kai -, and Dimitrios Gunopulos. SensorBench: benchmarking approaches to processing wireless sensor network data. In Conference on Scientific and Statistical Database Management, SSDBM ’14, Aalborg, Denmark, June 30 – July 02, 2014, pages 21:1–21:12. ACM, jun 2014. doi:10.1145/2618243.2618252
    [BibTeX] [Download PDF]
    @inproceedings{Galpin:SensorBench:SSDBM2014,
    author = {Ixent Galpin and
    Alan B. Stokes and
    George Valkanas and
    Alasdair J. G. Gray and
    Norman W. Paton and
    Alvaro A. A. Fernandes and
    Kai{-}Uwe Sattler and
    Dimitrios Gunopulos},
    title = {SensorBench: benchmarking approaches to processing wireless sensor
    network data},
    booktitle = {Conference on Scientific and Statistical Database Management, {SSDBM}
    '14, Aalborg, Denmark, June 30 - July 02, 2014},
    pages = {21:1--21:12},
    publisher = {{ACM}},
    month = jun,
    year = {2014},
    url = {https://doi.org/10.1145/2618243.2618252},
    doi = {10.1145/2618243.2618252}
    }

  • Marco Roos, Alasdair J. G. Gray, Andra Waagmeester, Mark Thompson, Rajaram Kaliyaperumal, Eelke van der Horst, Barend Mons, and Mark Wilkinson. Bring Your Own Data Workshops: A Mechanism to Aid Data Owners to Comply with Linked Data Best Practices. In Proceedings of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, Berlin, Germany, December 9-11, 2014., volume 1320 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2014.
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/swat4ls/RoosGWTKHMW14,
    author = {Marco Roos and
    Alasdair J. G. Gray and
    Andra Waagmeester and
    Mark Thompson and
    Rajaram Kaliyaperumal and
    Eelke van der Horst and
    Barend Mons and
    Mark Wilkinson},
    title = {Bring Your Own Data Workshops: {A} Mechanism to Aid Data Owners to
    Comply with Linked Data Best Practices},
    booktitle = {Proceedings of the 7th International Workshop on Semantic Web Applications
    and Tools for Life Sciences, Berlin, Germany, December 9-11, 2014.},
    series = {{CEUR} Workshop Proceedings},
    volume = {1320},
    publisher = {CEUR-WS.org},
    month = dec,
    year = {2014},
    url = {http://ceur-ws.org/Vol-1320/paper\_36.pdf},
    }

  • Sajjad Hussain, Hong Sun, Gokce Banu Laleci Erturkmen, Mustafa Yuksel, Charles Mead, Alasdair J. G. Gray, and Kerstin Forsberg. A Justification-based Semantic Framework for Representing , Evaluating and Utilizing Terminology Mappings. In Context Interpretation and Meaning, Riva del Garda, Italy, oct 2014.
    [BibTeX] [Abstract] [Download PDF]

    Use of medical terminologies and mappings across them are consid- ered to be crucial pre-requisites for achieving interoperable eHealth applica- tions. However, experiences from several research projects have demonstrated that the mappings are not enough. Also the context of the mappings is needed to enable interpretation of the meaning of the mappings. Built upon these experi- ences, we introduce a semantic framework for representing, evaluating and uti- lizing terminology mappings together with the context in terms of the justifica- tions for, and the provenance of, the mappings. The framework offers a plat- form for i) performing various mappings strategies, ii) representing terminology mappings together with their provenance information, and iii) enabling termi- nology reasoning for inferring both new and erroneous mappings. We present the results of the introduced framework using the SALUS project where we evaluated the quality of both existing and inferred terminology mappings among standard terminologies.

    @inproceedings{Hussain2014CIM,
    Abstract = {Use of medical terminologies and mappings across them are consid- ered to be crucial pre-requisites for achieving interoperable eHealth applica- tions. However, experiences from several research projects have demonstrated that the mappings are not enough. Also the context of the mappings is needed to enable interpretation of the meaning of the mappings. Built upon these experi- ences, we introduce a semantic framework for representing, evaluating and uti- lizing terminology mappings together with the context in terms of the justifica- tions for, and the provenance of, the mappings. The framework offers a plat- form for i) performing various mappings strategies, ii) representing terminology mappings together with their provenance information, and iii) enabling termi- nology reasoning for inferring both new and erroneous mappings. We present the results of the introduced framework using the SALUS project where we evaluated the quality of both existing and inferred terminology mappings among standard terminologies.},
    Address = {Riva del Garda, Italy},
    Author = {Hussain, Sajjad and Sun, Hong and Erturkmen, Gokce Banu Laleci and Yuksel, Mustafa and Mead, Charles and Gray, Alasdair J G and Forsberg, Kerstin},
    Booktitle = {Context Interpretation and Meaning},
    Title = {{A Justification-based Semantic Framework for Representing , Evaluating and Utilizing Terminology Mappings}},
    url = {http://www.macs.hw.ac.uk/~fm206/cim14/cim20140_submission_2.pdf},
    Month = oct,
    Year = {2014}
    }

  • Simon Jupp, James Malone, and Alasdair J. G. Gray. Capturing Provenance for a Linkset of Convenience. In Proceedings of the 4th Workshop on Linked Science 2014 – Making Sense Out of Data (LISC2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014., page 71–75, oct 2014.
    [BibTeX] [Abstract] [Download PDF]

    Biological interactions such as those between genes and proteins are complex and require intricate {OWL} models. However, direct links between biological entities can support search and data integration. In this paper we introduce linksets of convenience that capture these direct links. We show the provenance statements required to track the derivation of such linksets; linking them back to the full biological justification.

    @inproceedings{JuppMG14,
    abstract = {Biological interactions such as those between genes and proteins
    are complex and require intricate {OWL} models. However, direct
    links between biological entities can support search and data integration.
    In this paper we introduce linksets of convenience that capture
    these direct links. We show the provenance statements required to track
    the derivation of such linksets; linking them back to the full biological
    justification.},
    author = {Simon Jupp and
    James Malone and
    Alasdair J. G. Gray},
    title = {Capturing Provenance for a Linkset of Convenience},
    booktitle = {Proceedings of the 4th Workshop on Linked Science 2014 - Making Sense
    Out of Data {(LISC2014)} co-located with the 13th International Semantic
    Web Conference {(ISWC} 2014), Riva del Garda, Italy, October 19, 2014.},
    pages = {71--75},
    month = oct,
    year = {2014},
    url = {http://ceur-ws.org/Vol-1282/lisc2014_submission_7.pdf},
    }

2013

  • Patrick Jackman, Alasdair J. G. Gray, Andrew Brass, Robert Stevens, Ming Shi, Derek Scuffell, Simon Hammersley, and Bruce Grieve. Processing online crop disease warning information via sensor networks using ISA ontologies. Agricultural Engineering International: CIGR Journal, 15(3):243–251, 2013.
    [BibTeX] [Abstract]

    Growing demand for food is driving the need for higher crop yields globally. Correctly anticipating the onset of damaging crop diseases is essential to achieve this goal. Considerable efforts have been made recently to develop early warning systems. However, these methods lack a direct and online measurement of the spores that attack crops. A novel disease information network has been implemented and deployed. Spore sensors have been developed and deployed. The measurements from these sensors are combined with similar measurements of important local weather readings to generate estimates of crop disease risk. It is combined with other crop disease information allowing overall local disease risk assessments and forecasts to be made. The resulting data is published through a SPARQL endpoint to support reuse and connection into the linked data cloud.

    @article{Jackman:2013Processing-Online-Crop-Disease,
    title = "Processing online crop disease warning information via sensor networks using ISA ontologies",
    abstract = "Growing demand for food is driving the need for higher crop yields globally. Correctly anticipating the onset of damaging crop diseases is essential to achieve this goal. Considerable efforts have been made recently to develop early warning systems. However, these methods lack a direct and online measurement of the spores that attack crops. A novel disease information network has been implemented and deployed. Spore sensors have been developed and deployed. The measurements from these sensors are combined with similar measurements of important local weather readings to generate estimates of crop disease risk. It is combined with other crop disease information allowing overall local disease risk assessments and forecasts to be made. The resulting data is published through a SPARQL endpoint to support reuse and connection into the linked data cloud.",
    keywords = "Crop disease assessment, Data queries, Investigation study assay, Online sensors, Sensor network, Web semantics",
    author = "Patrick Jackman and Alasdair J G Gray and Andrew Brass and Robert Stevens and Ming Shi and Derek Scuffell and Simon Hammersley and Bruce Grieve",
    year = "2013",
    language = "English",
    volume = "15",
    pages = "243--251",
    journal = "Agricultural Engineering International: CIGR Journal",
    issn = "1682-1130",
    publisher = "International Commission of Agricultural and Biosystems Engineering",
    number = "3",
    }

  • Paolo Ciccarese, Stian Soiland -, Khalid Belhajjame, Alasdair J. G. Gray, Carole A. Goble, and Tim Clark. PAV ontology: provenance, authoring and versioning. Journal of Biomedical Semantics, 4:37, 2013. doi:10.1186/2041-1480-4-37
    [BibTeX] [Abstract] [Download PDF]

    Background Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. Results We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing “just enough” descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability. Method The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. Discussion We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.

    @article{Ciccarese:PAV:JoBS2013,
    abstract = {Background
    Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.
    Results
    We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing “just enough” descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability.
    Method
    The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible.
    Discussion
    We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.},
    author = {Paolo Ciccarese and
    Stian Soiland{-}Reyes and
    Khalid Belhajjame and
    Alasdair J. G. Gray and
    Carole A. Goble and
    Tim Clark},
    title = {{PAV} ontology: provenance, authoring and versioning},
    journal = {Journal of Biomedical Semantics},
    volume = {4},
    pages = {37},
    year = {2013},
    url = {https://doi.org/10.1186/2041-1480-4-37},
    doi = {10.1186/2041-1480-4-37}
    }

  • Alasdair J. G. Gray and Andrea Splendiani. The Hitch-hikers Guide to the Semantic Web. In Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013), Edinburgh, UK, dec 2013. (Tutorial)
    [BibTeX]
    @inproceedings{Gray2013SWAT4LSTutorial,
    address = {Edinburgh, UK},
    author = {Alasdair J. G. Gray and Andrea Splendiani},
    booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013)},
    month = dec,
    note = {(Tutorial)},
    title = {The Hitch-hikers Guide to the Semantic Web},
    year = {2013}
    }

  • C. Brenninkmeijer, C. Evelo, C. Goble, A. J. G. Gray, P. Groth, S. Pettifer, R. Stevens, A. Williams, and E. Willighagen. Scientific Lenses: An Approach to Dynamically Vary the Relationships between Datasets. In Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB 2013), Berlin, Germany, 2013. (Poster paper)
    [BibTeX]
    @inproceedings{brenninkmeijer:lenses:ismbeccb2013,
    address = {Berlin, Germany},
    author = {C. Brenninkmeijer and C. Evelo and C. Goble and A.J.G. Gray and P. Groth and S. Pettifer and R. Stevens and A. Williams and E. Willighagen},
    booktitle = {Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB 2013)},
    month = jul,
    note = {(Poster paper)},
    title = {Scientific Lenses: An Approach to Dynamically Vary the Relationships between Datasets},
    year = {2013}
    }

  • Egon Willighagen, Christian Y. A. Brenninkmeijer, Chris T. Evelo, Alasdair J. G. Gray, Carole Goble, Lee Harland, Andra Waagmeester, and Antony J. Williams. Open PHACTS: meaningful linking of preclinical drug discovery knowledge. In 245th American Chemcial Society National Meeting and Exposition, New Orleans, LA, USA, apr 2013. (Extended abstract)
    [BibTeX]
    @inproceedings{Willighagen:2013we,
    address = {New Orleans, LA, USA},
    author = {Egon Willighagen and Christian Y. A. Brenninkmeijer and Chris T. Evelo and Alasdair J. G. Gray and Carole Goble and Lee Harland and Andra Waagmeester and Antony J. Williams},
    booktitle = {245th American Chemcial Society National Meeting and Exposition},
    month = {apr},
    note = {(Extended abstract)},
    title = {{Open PHACTS}: meaningful linking of preclinical drug discovery knowledge},
    year = {2013}
    }

  • A. J. G. Gray. Enabling Drug Discovery Applications Through a Linked Data Platform. In Conference on Semantics in Healthcare and Life Sciences, Boston, MA, USA, 2013. (Technology talk)
    [BibTeX] [Abstract]

    We present the Open PHACTS linked data platform that is being developed to support a wide range of novel drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. The discovery of new medicines requires pharmacologists to interact with a number of data sources; ranging from data on chemical compounds to their interactions with targets. The linked data platform provides an integrated view over data retrieved from several complementary, but overlapping, data sources Key features of the Open PHACTS linked data platform are: 1) Domain specific API making drug discovery linked data available for a diverse range of applications without requiring the application developers to become knowledgeable of semantic web standards such as SPARQL; 2) Just-in-time identity resolution and alignment across datasets enabling a variety of entry points to the data and ultimately to support different integrated views of the data; 3) Centrally cached copies of public datasets to support interactive response times for user-facing applications. The Open PHACTS platform is hosted by OpenLink using the Virtuoso triplestore. This is enabling us to provide the security and privacy guarantees required by pharmaceutical companies. We have recently begun beta testing of the platform with our associated partners and anticipate a full public roll-out later in 2013. The utility of the linked data platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.

    @inproceedings{gray:OPS:cshals2013,
    abstract = {We present the Open PHACTS linked data platform that is being developed to support a wide range of novel drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies.
    The discovery of new medicines requires pharmacologists to interact with a number of data sources; ranging from data on chemical compounds to their interactions with targets. The linked data platform provides an integrated view over data retrieved from several complementary, but overlapping, data sources
    Key features of the Open PHACTS linked data platform are:
    1) Domain specific API making drug discovery linked data available for a diverse range of applications without requiring the application developers to become knowledgeable of semantic web standards such as SPARQL;
    2) Just-in-time identity resolution and alignment across datasets enabling a variety of entry points to the data and ultimately to support different integrated views of the data;
    3) Centrally cached copies of public datasets to support interactive response times for user-facing applications.
    The Open PHACTS platform is hosted by OpenLink using the Virtuoso triplestore. This is enabling us to provide the security and privacy guarantees required by pharmaceutical companies. We have recently begun beta testing of the platform with our associated partners and anticipate a full public roll-out later in 2013.
    The utility of the linked data platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.},
    address = {Boston, MA, USA},
    author = {A.J.G. Gray},
    booktitle = {Conference on Semantics in Healthcare and Life Sciences},
    month = feb,
    note = {(Technology talk)},
    title = {Enabling Drug Discovery Applications Through a Linked Data Platform},
    year = {2013}
    }

  • Carole A. Goble, Alasdair J. G. Gray, Lee Harland, Karen Karapetyan, Antonis Loizou, Ivan Mikhailov, Yrjänä Rankka, Stefan Senger, Valery Tkachenko, Antony J. Williams, and Egon L. Willighagen. Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery. In The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part II, volume 8219 of Lecture Notes in Computer Science, page 65–80. Springer, oct 2013. doi:10.1007/978-3-642-41338-4_5
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/semweb/GobleGHKLMRSTWW13,
    author = {Carole A. Goble and
    Alasdair J. G. Gray and
    Lee Harland and
    Karen Karapetyan and
    Antonis Loizou and
    Ivan Mikhailov and
    Yrj{\"{a}}n{\"{a}} Rankka and
    Stefan Senger and
    Valery Tkachenko and
    Antony J. Williams and
    Egon L. Willighagen},
    title = {Incorporating Commercial and Private Data into an Open Linked Data
    Platform for Drug Discovery},
    booktitle = {The Semantic Web - {ISWC} 2013 - 12th International Semantic Web Conference,
    Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part {II}},
    series = {Lecture Notes in Computer Science},
    volume = {8219},
    pages = {65--80},
    publisher = {Springer},
    month = oct,
    year = {2013},
    url = {https://doi.org/10.1007/978-3-642-41338-4\_5},
    doi = {10.1007/978-3-642-41338-4\_5}
    }

  • Ian Dunlop, Rishi Ramgolam, Steve Pettifer, Alasdair J. G. Gray, James Eales, Carole Goble, and Jan Velterop. Open PHACTS Explorer 2: Bringing the web to the semantic web. In Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013), Edinburgh, UK, dec 2013. CEUR-WS.
    [BibTeX] [Download PDF]
    @InProceedings{Dunlop2013,
    author = {Ian Dunlop and Rishi Ramgolam and Steve Pettifer and Alasdair J. G. Gray and James Eales and Carole Goble and Jan Velterop},
    title = {Open PHACTS Explorer 2: Bringing the web to the semantic web},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013)},
    year = {2013},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = dec,
    address = {Edinburgh, UK},
    OPTorganization = {},
    publisher = {CEUR-WS},
    OPTnote = {},
    OPTannote = {},
    url = {http://ceur-ws.org/Vol-1114/Demo_Dunlop.pdf}
    }

  • Christian Y. A. Brenninkmeijer, Ian Dunlop, Carole Goble, Alasdair J. G. Gray, Steve Pettifer, and Robert Stevens. Computing Identity Co-Reference Across Drug Discovery Datasets. In Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013), Edinburgh, UK, dec 2013. CEUR-WS.
    [BibTeX] [Download PDF]
    @InProceedings{Brenn2013SWAT4LS,
    author = {Christian Y. A. Brenninkmeijer and Ian Dunlop and Carole Goble and Alasdair J. G. Gray and Steve Pettifer and Robert Stevens},
    title = {Computing Identity Co-Reference Across Drug Discovery Datasets},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2013)},
    year = {2013},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = dec,
    address = {Edinburgh, UK},
    OPTorganization = {},
    publisher = {CEUR-WS},
    OPTnote = {},
    OPTannote = {},
    url = {http://ceur-ws.org/Vol-1114/Session4_Brenninkmeijer.pdf}
    }

  • Christian Y. A. Brenninkmeijer, Carole A. Goble, Alasdair J. G. Gray, Paul T. Groth, Antonis Loizou, and Steve Pettifer. Including Co-referent URIs in a SPARQL Query. In Proceedings of the Fourth International Workshop on Consuming Linked Data, COLD 2013, Sydney, Australia, October 22, 2013, Sydney, Australia, oct 2013. CEUR Workshop Proceedings. http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf
    [BibTeX] [Abstract] [Download PDF]

    Linked data relies on instance level links between potentially differing representations of concepts in multiple datasets. However, in large complex domains, such as pharmacology, the inter-relationship of data instances needs to consider the context (e.g. task, role) of the user and the assumptions they want to apply to the data. Such context is not taken into account in most linked data integration procedures. In this paper we argue that dataset links should be stored in a stand-off fashion, thus enabling different assumptions to be applied to the data links during query execution. We present the infrastructure developed for the Open PHACTS Discovery Platform to enable this and show through evaluation that the incurred performance cost is below the threshold of user perception.

    @InProceedings{Brenn2013COLD,
    abstract = {Linked data relies on instance level links between potentially
    differing representations of concepts in multiple datasets. However, in
    large complex domains, such as pharmacology, the inter-relationship of
    data instances needs to consider the context (e.g. task, role) of the user
    and the assumptions they want to apply to the data. Such context is
    not taken into account in most linked data integration procedures. In
    this paper we argue that dataset links should be stored in a stand-off
    fashion, thus enabling different assumptions to be applied to the data
    links during query execution. We present the infrastructure developed for
    the Open PHACTS Discovery Platform to enable this and show through
    evaluation that the incurred performance cost is below the threshold of
    user perception.},
    address = {Sydney, Australia},
    author = {Christian Y. A. Brenninkmeijer and
    Carole A. Goble and
    Alasdair J. G. Gray and
    Paul T. Groth and
    Antonis Loizou and
    Steve Pettifer},
    title = {Including Co-referent URIs in a {SPARQL} Query},
    booktitle = {Proceedings of the Fourth International Workshop on Consuming Linked
    Data, {COLD} 2013, Sydney, Australia, October 22, 2013},
    month = oct,
    year = {2013},
    publisher = {CEUR Workshop Proceedings},
    note = {http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf},
    url = {http://ceur-ws.org/Vol-1034/BrenninkmeijerEtAl_COLD2013.pdf}
    }

  • George Valkanas, Ixent Galpin, Alasdair J. G. Gray, Alvaro A. A. Fernandes, Norman W. Paton, and Dimitrios Gunopulos. Declarative In-Network Sensor Data Analysis. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2013 Workshop on Languages for Data Mining and Machine Learning, Prague, Czech Republic, 2013.
    [BibTeX] [Download PDF]
    @InProceedings{Valkanas:in-network-analysis:LML2013,
    author = {George Valkanas and Ixent Galpin and Alasdair J. G. Gray and Alvaro A. A. Fernandes and Norman W. Paton and Dimitrios Gunopulos},
    title = {Declarative In-Network Sensor Data Analysis},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2013 Workshop on Languages for Data Mining and Machine Learning},
    year = {2013},
    OPTeditor = {},
    OPTvolume = {},
    OPTnumber = {},
    OPTseries = {},
    OPTpages = {},
    month = sep,
    address = {Prague, Czech Republic},
    OPTorganization = {},
    OPTpublisher = {},
    OPTnote = {},
    OPTannote = {},
    url = {http://dtai.cs.kuleuven.be/lml/papers/lml2013_declarative_sensor.pdf}
    }

  • A. J. G. Gray, C. Chichester, K. Burger, S. Kotoulas, A. Loizou, V. Tkachenko, A. Waagmeester, S. Askjaer, S. Pettifer, L. Harland, C. Haupt, C. Batchelor, M. Vazquez, María J. Fernández, J. Saito, A. Gibson, and L. Wich. Guidelines for Nanopublications. Working Draft 1.8-20130102, Concept Web Alliance, 2013.
    [BibTeX] [Download PDF]
    @TechReport{nanopubs,
    author = {A.J.G. Gray and C. Chichester and K. Burger and S. Kotoulas and A. Loizou and V. Tkachenko and A. Waagmeester and S. Askjaer and S. Pettifer and L. Harland and C. Haupt and C. Batchelor and M. Vazquez and J. Mar\'ia Fern\'andez and J. Saito and A. Gibson and L. Wich},
    title = {Guidelines for Nanopublications},
    institution = {Concept Web Alliance},
    year = {2013},
    OPTkey = {},
    type = {Working Draft},
    number = {1.8-20130102},
    OPTaddress = {},
    month = jan,
    note = {},
    url = {http://nanopub.org/guidelines/working_draft/},
    OPTannote = {}
    }

2012

  • A. J. G. Gray, S. Askjaer, C. Y. Brenninkmeijer, K. Burger, C. Chichester, J. Eales, C. T. Evelo, C. Goble, P. Groth, L. Harland, A. Loizou, S. Pettifer, R. Ramgolam, M. Thompson, A. Waagmeester, and A. J. Williams. The Pharmacology Workspace: A Platform for Drug Discovery. In 3rd International Conference on Biomedical Ontology (ICBO 2012), Graz, Austria, jul 2012. (Demo Paper)
    [BibTeX] [Abstract]

    We present the Open PHACTS linked data platform that is being developed to address a set of example drug discovery research questions and which supports several drug discovery applications. The platform retrieves data from many complementary, but overlapping, data sources to present an integrated view of the data. The platform exploits two entity resolution services: respectively for transforming text and chemical structures to a concept. The single concept URI provided by the resolution service is then expanded to a set of equivalent URIs used by the data sources. Availability. An alpha version is currently available to the Open PHACTS consortium. A first public release of the platform will be made in late 2012, see http://www.openphacts.org/.

    @inproceedings{Gray:2012The-Pharmacology-Workspace:-A-Platform-for-Drug,
    abstract = {We present the Open PHACTS linked data platform that is being developed to address a set of example drug discovery research questions and which supports several drug discovery applications. The platform retrieves data from many complementary, but overlapping, data sources to present an integrated view of the data. The platform exploits two entity resolution services: respectively for transforming text and chemical structures to a concept. The single concept URI provided by the resolution service is then expanded to a set of equivalent URIs used by the data sources.
    Availability. An alpha version is currently available to the Open PHACTS consortium. A first public release of the platform will be made in late 2012, see http://www.openphacts.org/.},
    address = {Graz, Austria},
    author = {A.J.G. Gray and S. Askjaer and C.Y. Brenninkmeijer and K. Burger and C. Chichester and J. Eales and C.T. Evelo and C. Goble and P. Groth and L. Harland and A. Loizou and S. Pettifer and R. Ramgolam and M. Thompson and A. Waagmeester and A.J. Williams},
    booktitle = {3rd International Conference on Biomedical Ontology (ICBO 2012)},
    month = jul,
    note = {(Demo Paper)},
    title = {The Pharmacology Workspace: A Platform for Drug Discovery},
    webpdf = {http://kr-med.org/icbofois2012/proceedings/ICBO2012/ICBO2012DemosSingleFiles/ICBO-2012-session6-Gray.pdf},
    year = {2012}
    }

  • C. Brenninkmeijer, C. Evelo, C. Goble, A. J. G. Gray, P. Groth, S. Pettifer, R. Stevens, A. J. Williams, and E. L. Willighagen. Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision. In 2nd International Workshop on Linked Science 2012 – Tackling Big Data (LISC 2012), volume 951, Boston, MA, USA, nov 2012. CEUR Workshop Proceedings. (Alphabetic authorship)
    [BibTeX]
    @inproceedings{Brenn2012LISC,
    Address = {Boston, MA, USA},
    Author = {C. Brenninkmeijer and C. Evelo and C. Goble and A.J.G. Gray and P. Groth and S. Pettifer and R. Stevens and A.J. Williams and E.L. Willighagen},
    Booktitle = {2nd International Workshop on Linked Science 2012 -- Tackling Big Data (LISC 2012)},
    Note = {(Alphabetic authorship)},
    OPTpages = {},
    Title = {Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision},
    Year = {2012},
    Month = nov,
    Volume = {951},
    Publisher = {CEUR Workshop Proceedings},
    WebURL = {http://ceur-ws.org/Vol-951/paper5.pdf}
    }

  • Alasdair J. G. Gray, Sune Askjaer, Christian Y. A. Brenninkmeijer, Kees Burger, Christine Chichester, James M. Eales, Chris T. A. Evelo, Carole A. Goble, Paul T. Groth, Lee Harland, Antonis Loizou, Steve Pettifer, Rishi Ramgolam, Mark Thompson, Andra Waagmeester, and Antony J. Williams. The Pharmacology Workspace: A Platform for Drug Discovery. In Proceedings of the 3rd International Conference on Biomedical Ontology (ICBO 2012), KR-MED Series, Graz, Austria, July 21-25, 2012, volume 897 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2012.
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/icbo/GrayABBCEEGGHLPRTWW12,
    author = {Alasdair J. G. Gray and
    Sune Askjaer and
    Christian Y. A. Brenninkmeijer and
    Kees Burger and
    Christine Chichester and
    James M. Eales and
    Chris T. A. Evelo and
    Carole A. Goble and
    Paul T. Groth and
    Lee Harland and
    Antonis Loizou and
    Steve Pettifer and
    Rishi Ramgolam and
    Mark Thompson and
    Andra Waagmeester and
    Antony J. Williams},
    title = {The Pharmacology Workspace: {A} Platform for Drug Discovery},
    booktitle = {Proceedings of the 3rd International Conference on Biomedical Ontology
    {(ICBO} 2012), {KR-MED} Series, Graz, Austria, July 21-25, 2012},
    series = {{CEUR} Workshop Proceedings},
    volume = {897},
    publisher = {CEUR-WS.org},
    month = jul,
    year = {2012},
    url = {http://ceur-ws.org/Vol-897/demo\_4.pdf},
    }

  • A. J. G. Gray (Ed). Dataset descriptions for the Open Pharmacological Space. Working Draft, Open PHACTS, 2012.
    [BibTeX] [Download PDF]
    @TechReport{OPS-datadesc,
    author = {A.J.G. {Gray (Ed)}},
    title = {Dataset descriptions for the Open Pharmacological Space},
    institution = {Open PHACTS},
    year = {2012},
    OPTkey = {},
    type = {Working Draft},
    OPTnumber = {},
    OPTaddress = {},
    month = oct,
    note = {},
    url = {http://www.openphacts.org/specs/datadesc/},
    OPTannote = {}
    }

  • C. Y. A. Brenninkmeijer, C. Goble, A. J. G. Gray, P. Groth, A. Loizou, and S. Pettifer. Query Strategies to Support Context-Specific Views Through Stand-off Data Mappings. Technical Report, University of Manchester, 2012. (Alphabetic authorship)
    [BibTeX]
    @TechReport{query-expansion,
    author = {C.Y.A. Brenninkmeijer and C. Goble and A.J.G. Gray and P. Groth and A. Loizou and S. Pettifer},
    title = {Query Strategies to Support Context-Specific Views Through Stand-off Data Mappings},
    institution = {University of Manchester},
    year = {2012},
    OPTkey = {},
    OPTtype = {},
    OPTnumber = {},
    OPTaddress = {},
    OPTmonth = {},
    note = {(Alphabetic authorship)},
    OPTannote = {}
    }

2011

  • Alasdair J. G. Gray, Jason Sadler, Oles Kit, Kostis Kyzirakos, Manos Karpathiotakis, Jean-Paul Calbimonte, Kevin R. Page, Raúl García -, Alex Frazer, Ixent Galpin, Alvaro A. A. Fernandes, Norman W. Paton, Óscar Corcho, Manolis Koubarakis, David De Roure, Kirk Martinez, and Asunción Gómez -. A Semantic Sensor Web for Environmental Decision Support Applications. Sensors, 11(9):8855–8887, 2011. doi:10.3390/s110908855
    [BibTeX] [Download PDF]
    @article{DBLP:journals/sensors/GraySKKKCPGFGFP11,
    author = {Alasdair J. G. Gray and
    Jason Sadler and
    Oles Kit and
    Kostis Kyzirakos and
    Manos Karpathiotakis and
    Jean-Paul Calbimonte and
    Kevin R. Page and
    Ra{\'{u}}l Garc{\'{\i}}a{-}Castro and
    Alex Frazer and
    Ixent Galpin and
    Alvaro A. A. Fernandes and
    Norman W. Paton and
    {\'{O}}scar Corcho and
    Manolis Koubarakis and
    David De Roure and
    Kirk Martinez and
    Asunci{\'{o}}n G{\'{o}}mez{-}P{\'{e}}rez},
    title = {A Semantic Sensor Web for Environmental Decision Support Applications},
    journal = {Sensors},
    volume = {11},
    number = {9},
    pages = {8855--8887},
    year = {2011},
    url = {https://doi.org/10.3390/s110908855},
    doi = {10.3390/s110908855}
    }

  • Ixent Galpin, Christian Y. A. Brenninkmeijer, Alasdair J. G. Gray, Farhana Jabeen, Alvaro A. A. Fernandes, and Norman W. Paton. SNEE: a query processor for wireless sensor networks. Distributed and Parallel Databases, 29(1-2):31–85, 2011. doi:10.1007/s10619-010-7074-3
    [BibTeX] [Abstract] [Download PDF]

    A wireless sensor network (WSN) can be construed as an intelligent, large-scale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declaratively-expressed queries), then one can achieve a significant reduction in the cost of engineering the software that implements a data collection program for the WSN while still achieving, through query optimization, very favorable cost:benefit ratios. This paper describes a query processing framework for WSNs that meets many desiderata associated with the view of WSN as databases. The framework is presented in the form of compiler/optimizer, called SNEE, for a continuous declarative query language over sensed data streams, called SNEEql. SNEEql can be shown to meet the expressiveness requirements of a large class of applications. SNEE can be shown to generate effective and efficient query evaluation plans. More specifically, the paper describes the following contributions: (1) a user-level syntax and physical algebra for SNEEql, an expressive continuous query language over WSNs; (2) example concrete algorithms for physical algebraic operators defined in such a way that the task of deriving memory, time and energy analytical cost-estimation models (CEMs) for them becomes straightforward by reduction to a structural traversal of the pseudocode; (3) CEMs for the concrete algorithms alluded to; (4) an architecture for the optimization of SNEEql queries, called SNEE, building on well-established distributed query processing components where possible, but making enhancements or refinements where necessary to accommodate the WSN context; (5) algorithms that instantiate the components in the SNEE architecture, thereby supporting integrated query planning that includes routing, placement and timing; and (6) an empirical performance evaluation of the resulting framework.

    @article{DBLP:journals/dpd/GalpinBGJFP11,
    abstract = {A wireless sensor network (WSN) can be construed as an intelligent, large-scale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declaratively-expressed queries), then one can achieve a significant reduction in the cost of engineering the software that implements a data collection program for the WSN while still achieving, through query optimization, very favorable cost:benefit ratios. This paper describes a query processing framework for WSNs that meets many desiderata associated with the view of WSN as databases. The framework is presented in the form of compiler/optimizer, called SNEE, for a continuous declarative query language over sensed data streams, called SNEEql. SNEEql can be shown to meet the expressiveness requirements of a large class of applications. SNEE can be shown to generate effective and efficient query evaluation plans. More specifically, the paper describes the following contributions: (1) a user-level syntax and physical algebra for SNEEql, an expressive continuous query language over WSNs; (2) example concrete algorithms for physical algebraic operators defined in such a way that the task of deriving memory, time and energy analytical cost-estimation models (CEMs) for them becomes straightforward by reduction to a structural traversal of the pseudocode; (3) CEMs for the concrete algorithms alluded to; (4) an architecture for the optimization of SNEEql queries, called SNEE, building on well-established distributed query processing components where possible, but making enhancements or refinements where necessary to accommodate the WSN context; (5) algorithms that instantiate the components in the SNEE architecture, thereby supporting integrated query planning that includes routing, placement and timing; and (6) an empirical performance evaluation of the resulting framework.},
    author = {Ixent Galpin and
    Christian Y. A. Brenninkmeijer and
    Alasdair J. G. Gray and
    Farhana Jabeen and
    Alvaro A. A. Fernandes and
    Norman W. Paton},
    title = {{SNEE:} a query processor for wireless sensor networks},
    journal = {Distributed and Parallel Databases},
    volume = {29},
    number = {1-2},
    pages = {31--85},
    year = {2011},
    url = {https://doi.org/10.1007/s10619-010-7074-3},
    doi = {10.1007/s10619-010-7074-3}
    }

  • Sam Watt, Robin Newman, Craig Hutton, Jason Sadler, and Alasdair J. G. Gray. Spatial Information Management Application: Semantic Sensor Webs for Coastal Flooding. In Coast GIS 2011, Oostende, Belgium, 2011. (Extended abstract)
    [BibTeX] [Download PDF]
    @InProceedings{Sadler2011-Spatial-,
    author = {Sam Watt and Robin Newman and Craig Hutton and Jason Sadler and Alasdair J. G. Gray},
    title = {Spatial Information Management Application: Semantic Sensor Webs for Coastal Flooding},
    booktitle = {Coast GIS 2011},
    year = {2011},
    address = {Oostende, Belgium},
    month = sep,
    note = {(Extended abstract)},
    url = {http://coastgis.corila.it/2011/download/abstract/TC-025_Watt_et_al_abstract.pdf}
    }

  • I. Galpin, R. Taylor, A. J. G. Gray, C. Y. A. Brenninkmeijer, A. A. A. Fernandes, and N. W. Paton. Executing in-network queries using SNEE. In 28th British National Conference on Databases (BNCOD 2011), volume 7051 of Lecture Notes in Computer Science, pages 136-139, Manchester, UK, jul 2011. Springer. (Demo Paper) doi:10.1007/978-3-642-24577-0_15
    [BibTeX] [Abstract] [Download PDF]

    The SNEE query optimizer enables users to characterize data requests against wireless sensor networks (WSNs), using a declarative query language called SNEEql (SNEE for Sensor NEtwork Engine, described in [GBG+11], and publicly available at http://code.google.com/p/snee ). Queries are compiled into imperative query execution plans, which are translated into executable nesC source code. In this paper, we illustrate the lifecycle of a SNEEql query Q for in-network execution. This lifecycle encompasses the steps of preparatory metadata collection, followed by the compilation of Q into a query execution plan QEP, the dissemination of binary images implementing QEP throughout the WSN, and the generation of query results.

    @inproceedings{Galpin-etal:SNEE-OTA-DEMO:BNCOD2011,
    abstract = {The SNEE query optimizer enables users to characterize data requests against wireless sensor networks (WSNs), using a declarative query language called SNEEql (SNEE for Sensor NEtwork Engine, described in [GBG+11], and publicly available at http://code.google.com/p/snee ). Queries are compiled into imperative query execution plans, which are translated into executable nesC source code. In this paper, we illustrate the lifecycle of a SNEEql query Q for in-network execution. This lifecycle encompasses the steps of preparatory metadata collection, followed by the compilation of Q into a query execution plan QEP, the dissemination of binary images implementing QEP throughout the WSN, and the generation of query results.},
    address = {Manchester, UK},
    author = {I. Galpin and R. Taylor and A.J.G. Gray and C.Y.A. Brenninkmeijer and A.A.A. Fernandes and N.W. Paton},
    booktitle = {28th British National Conference on Databases (BNCOD 2011)},
    doi = {10.1007/978-3-642-24577-0_15},
    month = jul,
    note = {(Demo Paper)},
    pages = {136-139},
    publisher = {Springer},
    series = {Lecture Notes in Computer Science},
    title = {Executing in-network queries using {SNEE}},
    volume = {7051},
    year = {2011},
    url = {https://doi.org/10.1007/978-3-642-24577-0_15}
    }

  • George Valkanas, Alexios Kotsifakos, Dimitrios Gunopulos, Ixent Galpin, Alasdair J. G. Gray, Alvaro A. A. Fernandes, and Norman W. Paton. Deploying in-network data analysis techniques in sensor networks. In 12th IEEE International Conference on Mobile Data Management, MDM 2011, Luleå, Sweden, June 6-9, 2011, Volume 1, volume 6644 of Lecture Notes in Computer Science, page 300–314, Luleå, Sweden, jun 2011. Springer. (Demo Paper) doi:10.1109/MDM.2011.54
    [BibTeX] [Abstract] [Download PDF]

    Sensor Networks have received considerable attention recently, as they provide manifold benefits. Not only are they a means for data acquisition and monitoring of unexplored or inaccessible areas, they are also a low-cost alternative for sensing the environment, which greatly aids to better understand our surroundings. A major motivation in either occasion is to acknowledge endangering situations and take action(s) accordingly. To this end, we would like to enable data mining or analysis techniques on top or, even better, within such networks, due to the prohibitive cost of communication in this setting. In this work, we demonstrate running data mining algorithms on a set of sensors, which are of low-processing power. In addition to showcasing the execution of data analysis algorithms on resource-constrained hardware, our demo is intended to show how to take advantage of the properties of each algorithm to make better use of the sensors and their capabilities. We support the execution and monitoring of these algorithms with a graphical user interface (GUI).

    @inproceedings{Valkanas-etal:DAT-SNEE-Demo:MDM2011,
    abstract = {Sensor Networks have received considerable attention recently, as they provide manifold benefits. Not only are they a means for data acquisition and monitoring of unexplored or inaccessible areas, they are also a low-cost alternative for sensing the environment, which greatly aids to better understand our surroundings. A major motivation in either occasion is to acknowledge endangering situations and take action(s) accordingly. To this end, we would like to enable data mining or analysis techniques on top or, even better, within such networks, due to the prohibitive cost of communication in this setting. In this work, we demonstrate running data mining algorithms on a set of sensors, which are of low-processing power. In addition to showcasing the execution of data analysis algorithms on resource-constrained hardware, our demo is intended to show how to take advantage of the properties of each algorithm to make better use of the sensors and their capabilities. We support the execution and monitoring of these algorithms with a graphical user interface (GUI).},
    address = {Lule\aa, Sweden},
    author = {George Valkanas and
    Alexios Kotsifakos and
    Dimitrios Gunopulos and
    Ixent Galpin and
    Alasdair J. G. Gray and
    Alvaro A. A. Fernandes and
    Norman W. Paton},
    booktitle = {12th {IEEE} International Conference on Mobile Data Management, {MDM}
    2011, Lule{\aa}, Sweden, June 6-9, 2011, Volume 1},
    doi = {10.1109/MDM.2011.54},
    month = jun,
    note = {(Demo Paper)},
    series = {Lecture Notes in Computer Science},
    volume = {6644},
    pages = {300--314},
    publisher = {Springer},
    month = jun,
    year = {2011},
    title = {Deploying in-network data analysis techniques in sensor networks},
    year = {2011},
    url = {https://doi.org/10.1109/MDM.2011.54}
    }

  • O. Corcho, A. J. G. Gray, K. Kyzirakos, J. -P. Calbimonte, and K. Page. Building Semantic Sensor Webs and Applications. In European Semantic Web Conference (ESWC 2011) – Tutorial, 2011. (Tutorial)
    [BibTeX]
    @inproceedings{Corcho2011Building-Semant,
    author = {O. Corcho and A.J.G. Gray and K. Kyzirakos and J.-P. Calbimonte and K. Page},
    booktitle = {European Semantic Web Conference (ESWC 2011) -- Tutorial},
    month = may,
    note = {(Tutorial)},
    title = {Building Semantic Sensor Webs and Applications},
    year = {2011}
    }

  • Alasdair J. G. Gray, Raul Garcia -, Kostis Kyzirakos, Manos Karpathiotakis, Jean-Paul Calbimonte, Kevin R. Page, Jason Sadler, Alex Frazer, Ixent Galpin, Alvaro A. A. Fernandes, Norman W. Paton, Óscar Corcho, Manolis Koubarakis, David De Roure, Kirk Martinez, and Asunción Gómez -. A Semantically Enabled Service Architecture for Mashups over Streaming and Stored Data. In The Semanic Web: Research and Applications – 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 – June 2, 2011, Proceedings, Part II, 2011. doi:10.1007/978-3-642-21064-8_21
    [BibTeX] [Abstract] [Download PDF]

    Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g. flood emergency response. However, in order to interpret the readings from the sensors, the data needs to be put in context through correlation with other sensor readings, sensor data histories, and stored data, as well as juxtaposing with maps and forecast models. In this paper we use a flood emergency response planning application to identify requirements for a semantic sensor web. We propose a generic service architecture to satisfy the requirements that uses semantic annotations to support well-informed interactions between the services. We present the SemSorGrid4Env realisation of the architecture and illustrate its capabilities in the context of the example application.

    @inproceedings{DBLP:conf/esws/GrayGKKCPSFGFPCKRMG11,
    abstract = {Sensing devices are increasingly being deployed to monitor the physical world around us. One class of application for which sensor data is pertinent is environmental decision support systems, e.g. flood emergency response. However, in order to interpret the readings from the sensors, the data needs to be put in context through correlation with other sensor readings, sensor data histories, and stored data, as well as juxtaposing with maps and forecast models. In this paper we use a flood emergency response planning application to identify requirements for a semantic sensor web. We propose a generic service architecture to satisfy the requirements that uses semantic annotations to support well-informed interactions between the services. We present the SemSorGrid4Env realisation of the architecture and illustrate its capabilities in the context of the example application.},
    author = {Alasdair J. G. Gray and
    Raul Garcia{-}Castro and
    Kostis Kyzirakos and
    Manos Karpathiotakis and
    Jean-Paul Calbimonte and
    Kevin R. Page and
    Jason Sadler and
    Alex Frazer and
    Ixent Galpin and
    Alvaro A. A. Fernandes and
    Norman W. Paton and
    {\'{O}}scar Corcho and
    Manolis Koubarakis and
    David De Roure and
    Kirk Martinez and
    Asunci{\'{o}}n G{\'{o}}mez{-}P{\'{e}}rez},
    title = {A Semantically Enabled Service Architecture for Mashups over Streaming
    and Stored Data},
    booktitle = {The Semanic Web: Research and Applications - 8th Extended Semantic
    Web Conference, {ESWC} 2011, Heraklion, Crete, Greece, May 29 - June
    2, 2011, Proceedings, Part {II}},
    year = {2011},
    month = may,
    url = {https://doi.org/10.1007/978-3-642-21064-8\_21},
    doi = {10.1007/978-3-642-21064-8\_21}
    }

  • G. Valkanas, D. Gunopulos, I. Galpin, A. J. G. Gray, and A. A. A. Fernandes:. Extending query languages for in-network query processing. In 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2011), pages 34-41, Athens, Greece, jun 2011. ACM. doi:10.1145/1999309.1999318
    [BibTeX] [Abstract]

    Sensor networks have become ubiquitous and their proliferation in day-to-day life provides new research challenges. Sensors deployed at forest sites, high performance facilities, or areas striken by environmental, or other, phenomena, are only a few representative examples. More recently, mobile sensor networks have made their presence and are rapidly growing in numbers, such as the successful ZebraNet project or PDAs and smartphones. Nevertheless, such networks have mainly been used for data acquisition and data are being processed externally instead of in-network. Basic research problems that arise in the in-network setting include how to adjust in a timely and efficient manner to changing conditions and network topology. In this paper, we present a methodology, based on declarative query processing to alleviate the aforementioned problems, by making the deployment and optimization of a data analysis application as automatic as possible, which also helps execution in mobile environments. Our proposed solution focuses on extending a state-of-the-art sensor network platform, SNEE, with builtin data analysis capabilities.

    @InProceedings{Valkanas:SNEE-A:MobiDE2011,
    abstract = {Sensor networks have become ubiquitous and their proliferation in day-to-day life provides new research challenges. Sensors deployed at forest sites, high performance facilities, or areas striken by environmental, or other, phenomena, are only a few representative examples. More recently, mobile sensor networks have made their presence and are rapidly growing in numbers, such as the successful ZebraNet project or PDAs and smartphones. Nevertheless, such networks have mainly been used for data acquisition and data are being processed externally instead of in-network. Basic research problems that arise in the in-network setting include how to adjust in a timely and efficient manner to changing conditions and network topology. In this paper, we present a methodology, based on declarative query processing to alleviate the aforementioned problems, by making the deployment and optimization of a data analysis application as automatic as possible, which also helps execution in mobile environments. Our proposed solution focuses on extending a state-of-the-art sensor network platform, SNEE, with builtin data analysis capabilities.},
    author = {G. Valkanas and D. Gunopulos and I. Galpin and A.J.G. Gray and A.A.A. Fernandes:},
    title = {Extending query languages for in-network query processing},
    booktitle = {10th ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2011)},
    year = {2011},
    pages = {34-41},
    month = jun,
    address = {Athens, Greece},
    publisher = {ACM},
    DOI = {10.1145/1999309.1999318}
    }

  • Alvaro A. A. Fernandes, Alasdair J. G. Gray, and Khalid Belhajjame, editors. Advances in Databases, 28th British National Conference on Databases, volume 7051 of Lecture Notes in Computer Science, Manchester, UK, 2011. Springer. doi:10.1007/978-3-642-24577-0
    [BibTeX]
    @proceedings{bncod2011,
    Address = {Manchester, UK},
    Booktitle = {BNCOD},
    Editor = {Alvaro A. A. Fernandes and Alasdair J. G. Gray and Khalid Belhajjame},
    Month = jul,
    Publisher = {Springer},
    Series = {Lecture Notes in Computer Science},
    Title = {Advances in Databases, 28th British National Conference on Databases},
    Volume = {7051},
    Year = {2011},
    DOI = {10.1007/978-3-642-24577-0},
    ISBN = {978-3-642-24576-3},
    }

2010

  • Alasdair J. G. Gray, Norman Gray, Christopher W. Hall, and Iadh Ounis. Finding the right term: Retrieving and exploring semantic concepts in astronomical vocabularies. Information Processing and Management, 46(4):470–478, 2010. (Alphabetic authorship) doi:10.1016/j.ipm.2009.09.004
    [BibTeX] [Abstract] [Download PDF]

    Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept. In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.

    @article{DBLP:journals/ipm/GrayGHO10,
    abstract = {Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept.
    In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.},
    author = {Alasdair J. G. Gray and
    Norman Gray and
    Christopher W. Hall and
    Iadh Ounis},
    title = {Finding the right term: Retrieving and exploring semantic concepts
    in astronomical vocabularies},
    journal = {Information Processing and Management},
    volume = {46},
    number = {4},
    pages = {470--478},
    year = {2010},
    Note = {(Alphabetic authorship)},
    url = {https://doi.org/10.1016/j.ipm.2009.09.004},
    doi = {10.1016/j.ipm.2009.09.004}
    }

  • Alasdair J. G. Gray. Whither BNCOD? The Future of Database and Information Systems Research. In Data Security and Security Data – 27th British National Conference on Databases, BNCOD 27, Dundee, UK, June 29 – July 1, 2010. Revised Selected Papers, volume 6121 of Lecture Notes in Computer Science, page 3–6, Dundee, UK, jul 2010. Springer. (Panel paper) doi:10.1007/978-3-642-25704-9_2
    [BibTeX] [Download PDF]
    @inproceedings{Gray2010Whither-BNCOD-T,
    address = {Dundee, UK},
    author = {Alasdair J. G. Gray},
    booktitle = {Data Security and Security Data - 27th British National Conference on Databases, {BNCOD} 27, Dundee, UK, June 29 - July 1, 2010. Revised Selected Papers},
    doi = {10.1007/978-3-642-25704-9\_2},
    month = jul,
    note = {(Panel paper)},
    pages = {3--6},
    publisher = {Springer},
    series = {Lecture Notes in Computer Science},
    title = {Whither {BNCOD}? The Future of Database and Information Systems Research},
    url = {https://doi.org/10.1007/978-3-642-25704-9\_2},
    volume = {6121},
    year = {2010},
    }

  • Jean-Paul Calbimonte, Óscar Corcho, and Alasdair J. G. Gray. Enabling Ontology-Based Access to Streaming Data Sources. In The Semantic Web – ISWC 2010 – 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I, volume 6496 of Lecture Notes in Computer Science, page 96–111. Springer, 2010. (Alphabetical authorship, equal responsibility) doi:10.1007/978-3-642-17746-0_7
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/semweb/CalbimonteCG10,
    author = {Jean-Paul Calbimonte and
    {\'{O}}scar Corcho and
    Alasdair J. G. Gray},
    title = {Enabling Ontology-Based Access to Streaming Data Sources},
    booktitle = {The Semantic Web - {ISWC} 2010 - 9th International Semantic Web Conference,
    {ISWC} 2010, Shanghai, China, November 7-11, 2010, Revised Selected
    Papers, Part {I}},
    series = {Lecture Notes in Computer Science},
    volume = {6496},
    pages = {96--111},
    publisher = {Springer},
    month = nov,
    note = {(Alphabetical authorship, equal responsibility)},
    year = {2010},
    url = {https://doi.org/10.1007/978-3-642-17746-0\_7},
    doi = {10.1007/978-3-642-17746-0\_7}
    }

  • Stuart Chalmers, Norman Gray, Iadh Ounis, and Alasdair J. G. Gray. Collaborative Editing and Linking of Astronomy Vocabularies Using Semantic Mediawiki. In Proceedings of the 5th Workshop on Semantic Wikis, Hersonissos, Heraklion, Crete, Greece, May 31st, 2010, volume 632 of {CEUR} Workshop Proceedings. CEUR-WS.org, may 2010.
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/semwiki/ChalmersGOG10,
    author = {Stuart Chalmers and
    Norman Gray and
    Iadh Ounis and
    Alasdair J. G. Gray},
    title = {Collaborative Editing and Linking of Astronomy Vocabularies Using
    Semantic Mediawiki},
    booktitle = {Proceedings of the 5th Workshop on Semantic Wikis, Hersonissos, Heraklion,
    Crete, Greece, May 31st, 2010},
    series = {{CEUR} Workshop Proceedings},
    volume = {632},
    publisher = {CEUR-WS.org},
    month = may,
    year = {2010},
    url = {http://ceur-ws.org/Vol-632/paper06.pdf},
    }

  • A. J. G. Gray. Distributed Query Processing over Streaming and Stored Data. In Workshop on Semantic Challenges in Sensor Networks, volume 10042, Dagstuhl, Germany, 2010.
    [BibTeX] [Download PDF]
    @InProceedings{Gray2010DQPDagstuhl,
    author = {A. J. G. Gray},
    title = {Distributed Query Processing over Streaming and Stored Data},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Workshop on Semantic Challenges in Sensor Networks},
    OPTpages = {},
    year = {2010},
    OPTeditor = {},
    volume = {10042},
    OPTnumber = {},
    OPTseries = {},
    address = {Dagstuhl, Germany},
    month = jan,
    OPTorganization = {},
    OPTpublisher = {},
    url = {https://www.dagstuhl.de/10042},
    OPTnote = {},
    OPTannote = {}
    }

  • A. J. G. Gray. SNEE In-WSN Query Processing Demonstration. In Workshop on Semantic Challenges in Sensor Networks, volume 10042, Dagstuhl, Germany, jan 2010.
    [BibTeX] [Download PDF]
    @InProceedings{Gray2010SNEEDemo,
    author = {A. J. G. Gray},
    title = {{SNEE} {In-WSN} Query Processing Demonstration},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Workshop on Semantic Challenges in Sensor Networks},
    OPTpages = {},
    year = {2010},
    OPTeditor = {},
    volume = {10042},
    OPTnumber = {},
    OPTseries = {},
    address = {Dagstuhl, Germany},
    month = jan,
    OPTorganization = {},
    OPTpublisher = {},
    url = {https://www.dagstuhl.de/10042},
    OPTnote = {},
    OPTannote = {}
    }

2009

  • Alasdair J. G. Gray, Norman Gray, and Iadh Ounis. Can RDB2RDF Tools Feasibily Expose Large Science Archives for Data Integration?. In The Semantic Web: Research and Applications, 6th European Semantic Web Conference, ESWC 2009, Heraklion, Crete, Greece, May 31-June 4, 2009, Proceedings, volume 5554 of Lecture Notes in Computer Science, page 491–505. Springer, jun 2009. doi:10.1007/978-3-642-02121-3_37
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/esws/GrayGO09,
    author = {Alasdair J. G. Gray and
    Norman Gray and
    Iadh Ounis},
    title = {Can {RDB2RDF} Tools Feasibily Expose Large Science Archives for Data
    Integration?},
    booktitle = {The Semantic Web: Research and Applications, 6th European Semantic
    Web Conference, {ESWC} 2009, Heraklion, Crete, Greece, May 31-June
    4, 2009, Proceedings},
    series = {Lecture Notes in Computer Science},
    volume = {5554},
    pages = {491--505},
    publisher = {Springer},
    month = jun,
    year = {2009},
    url = {https://doi.org/10.1007/978-3-642-02121-3\_37},
    doi = {10.1007/978-3-642-02121-3\_37}
    }

  • A. J. G. Gray, N. Gray, and I. Ounis. Searching and exploring controlled vocabularies. In Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009), pages 1-5, Barcelona, Spain, 2009.
    [BibTeX]
    @inproceedings{Gray2009Searching-and-e,
    Address = {Barcelona, Spain},
    Author = {A.J.G. Gray and N. Gray and I. Ounis},
    Booktitle = {Exploiting Semantic Annotations in Information Retrieval (ESAIR 2009)},
    Pages = {1-5},
    Title = {Searching and exploring controlled vocabularies},
    Year = {2009}}

  • A. J. G. Gray, N. Gray, F. V. Hessman, and A. Preite Martinez (Eds). Vocabularies in the Virtual Observatory. Recommendation v1.19, IVOA, 2009. \url{http://www.ivoa.net/Documents/latest/Vocabularies.html}
    [BibTeX] [Download PDF]
    @techreport{gray-etal:vocab-VO:2009,
    Author = {A.J.G. Gray and N. Gray and F.V. Hessman and A. {Preite Martinez (Eds)}},
    Institution = {IVOA},
    Note = {\url{http://www.ivoa.net/Documents/latest/Vocabularies.html}},
    Number = {v1.19},
    Title = {Vocabularies in the Virtual Observatory},
    Type = {Recommendation},
    Year = {2009},
    url = {http://www.ivoa.net/Documents/latest/Vocabularies.html}
    }

2008

  • A. J. G. Gray, N. Gray, and I. Ounis. Vocabularies in the VO. In Astronomical Data Analysis Software and Systems (ADASS XVIII), volume 411, Québec City, Canada, 2008. Astronomical Society of the Pacific. (Extended abstract)
    [BibTeX]
    @inproceedings{Gray2008Vocabularies-in,
    Address = {Qu{\'e}bec City, Canada},
    Author = {A.J.G. Gray and N. Gray and I. Ounis},
    Booktitle = {Astronomical Data Analysis Software and Systems (ADASS XVIII)},
    Page = {179},
    Publisher = {Astronomical Society of the Pacific},
    Title = {Vocabularies in the {VO}},
    Volume = {411},
    month = nov,
    Year = {2008},
    note = {(Extended abstract)}
    }

  • Alasdair J. G. Gray, Norman Gray, and Iadh Ounis. Finding Data Resources in a Virtual Observatory Using SKOS Vocabularies. In Sharing Data, Information and Knowledge, 25th British National Conference on Databases, BNCOD 25, Cardiff, UK, July 7-10, 2008. Proceedings, volume 5071 of Lecture Notes in Computer Science, page 189–192, Cardiff, UK, jul 2008. Springer. (Poster paper) doi:10.1007/978-3-540-70504-8_19
    [BibTeX] [Download PDF]
    @inproceedings{Gray2008Finding-Data-Re,
    address = {Cardiff, UK},
    author = {Alasdair J. G. Gray and Norman Gray and Iadh Ounis},
    booktitle = {Sharing Data, Information and Knowledge, 25th British National Conference on Databases, {BNCOD} 25, Cardiff, UK, July 7-10, 2008. Proceedings},
    doi = {10.1007/978-3-540-70504-8\_19},
    month = jul,
    note = {(Poster paper)},
    pages = {189--192},
    publisher = {Springer},
    series = {Lecture Notes in Computer Science},
    title = {Finding Data Resources in a Virtual Observatory Using {SKOS} Vocabularies},
    url = {https://doi.org/10.1007/978-3-540-70504-8\_19},
    volume = {5071},
    year = {2008},
    }

  • A. J. G. Gray, N. Gray, and I. Ounis. Accessing existing distributed science archives as RDF models. In UK e-Science All Hands Meeting, Edinburgh, UK, 2008.
    [BibTeX]
    @inproceedings{Gray2008Accessing-exist,
    Address = {Edinburgh, UK},
    Author = {A.J.G. Gray and N. Gray and I. Ounis},
    Booktitle = {UK e-Science All Hands Meeting},
    Title = {Accessing existing distributed science archives as {RDF} models},
    Year = {2008}}

2007

  • Alasdair J. G. Gray, Werner Nutt, and Howard M. Williams. Answering queries over incomplete data stream histories. International Journal of Web Information Systems (IJWIS), 3(1/2):41–60, 2007. doi:10.1108/17440080710829216
    [BibTeX] [Abstract] [Download PDF]

    Purpose Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user. Design/methodology/approach A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered. Findings It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness. Research limitations/implications The techniques and models proposed in this paper have only been partially implemented. Practical implications The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data. Originality/value This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.

    @article{Gray:AnsQIncompleteStream:IJWIS2007,
    abstract = {Purpose
    Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user.
    Design/methodology/approach
    A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered.
    Findings
    It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness.
    Research limitations/implications
    The techniques and models proposed in this paper have only been partially implemented.
    Practical implications
    The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data.
    Originality/value
    This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.},
    author = {Alasdair J. G. Gray and
    Werner Nutt and
    M. Howard Williams},
    title = {Answering queries over incomplete data stream histories},
    journal = {International Journal of Web Information Systems ({IJWIS})},
    volume = {3},
    number = {1/2},
    pages = {41--60},
    year = {2007},
    url = {https://doi.org/10.1108/17440080710829216},
    doi = {10.1108/17440080710829216}
    }

  • Alasdair J. G. Gray. Integrating Distributed Data Streams. PhD thesis, Heriot-Watt University, Edinburgh, UK, 2007.
    [BibTeX] [Abstract] [Download PDF]

    There is an increasing amount of information being made available as data streams, e.g. stock tickers, data from sensor networks, smart homes, monitoring data, etc. In many cases, this data is generated by distributed sources under the control of many different organisations. Users would like to seamlessly query such data without prior knowledge of where it is located or how it is published. This is similar to the problem of integrating data residing in multiple heterogeneous stored data sources. However, the techniques developed for stored data are not applicable due to the continuous and long-lived nature of queries over data streams. This thesis proposes an architecture for a stream integration system. A key feature of the architecture is a republisher component that collects together distributed streams and makes the merged stream available for querying. A formal model for the system has been developed and is used to generate plans for executing continuous queries which exploit the redundancy introduced by the republishers. Additionally, due to the long-lived nature of continuous queries, mechanisms for maintaining the plans whenever there is a change in the set of data sources have been developed. A prototype of the system has been implemented and performance measures made. The work of this thesis has been motivated by the problem of retrieving monitoring information about Grid resources. However, the techniques developed are general and can be applied wherever there is a need to publish and query distributed data involving data streams.

    @phdthesis{Gray2007Integrating-Dis,
    Abstract = {  There is an increasing amount of information being made available as data streams, e.g. stock tickers, data from sensor networks, smart homes, monitoring data, etc.
    In many cases, this data is generated by distributed sources under the control of many different organisations.
    Users would like to seamlessly query such data without prior knowledge of where it is located or how it is published.
    This is similar to the problem of integrating data residing in multiple heterogeneous stored data sources.
    However, the techniques developed for stored data are not applicable due to the continuous and long-lived nature of queries over data streams.
    This thesis proposes an architecture for a stream integration system.
    A key feature of the architecture is a republisher component that collects together distributed streams and makes the merged stream available for querying.
    A formal model for the system has been developed and is used to generate plans for executing continuous queries which exploit the redundancy introduced by the republishers.
    Additionally, due to the long-lived nature of continuous queries, mechanisms for maintaining the plans whenever there is a change in the set of data sources have been developed.
    A prototype of the system has been implemented and performance measures made.
    The work of this thesis has been motivated by the problem of retrieving monitoring information about Grid resources.
    However, the techniques developed are general and can be applied wherever there is a need to publish and query distributed data involving data streams.
    },
    Address = {Edinburgh, UK},
    Author = {Alasdair J.G. Gray},
    School = {Heriot-Watt University},
    Title = {Integrating Distributed Data Streams},
    Url = {http://AlasdairGray.github.io/publications/thesis-final_web-copy.pdf},
    Year = {2007},
    }

2006

  • Alasdair J. G. Gray, Werner Nutt, and Howard M. Williams. Sources of Incompleteness in Grid Publishing. In Flexible and Efficient Information Handling, 23rd British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK, July 18-20, 2006, Proceedings, volume 4042 of Lecture Notes in Computer Science, page 94–101. Springer, 2006. doi:10.1007/11788911_8
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/bncod/GrayNW06,
    author = {Alasdair J. G. Gray and
    Werner Nutt and
    M. Howard Williams},
    title = {Sources of Incompleteness in Grid Publishing},
    booktitle = {Flexible and Efficient Information Handling, 23rd British National
    Conference on Databases, {BNCOD} 23, Belfast, Northern Ireland, UK,
    July 18-20, 2006, Proceedings},
    series = {Lecture Notes in Computer Science},
    volume = {4042},
    pages = {94--101},
    publisher = {Springer},
    month = jul,
    year = {2006},
    url = {https://doi.org/10.1007/11788911\_8},
    doi = {10.1007/11788911\_8}
    }

  • Alasdair J. G. Gray, Howard M. Williams, and Werner Nutt. Answering Arbitrary Conjunctive Queries over Incomplete Data Stream Histories. In iiWAS’2006 – The Eighth International Conference on Information Integration and Web-based Applications Services, 4-6 December 2006, Yogyakarta, Indonesia, volume 214 of books@ocg.at, page 259–268. Austrian Computer Society, 2006.
    [BibTeX]
    @inproceedings{DBLP:conf/iiwas/GrayWN06,
    author = {Alasdair J. G. Gray and
    M. Howard Williams and
    Werner Nutt},
    title = {Answering Arbitrary Conjunctive Queries over Incomplete Data Stream
    Histories},
    booktitle = {iiWAS'2006 - The Eighth International Conference on Information Integration
    and Web-based Applications Services, 4-6 December 2006, Yogyakarta,
    Indonesia},
    series = {books@ocg.at},
    volume = {214},
    pages = {259--268},
    publisher = {Austrian Computer Society},
    month = dec,
    year = {2006}
    }

2005

  • Andrew W. Cooke, Alasdair J. G. Gray, and Werner Nutt. Stream Integration Techniques for Grid Monitoring. Journal on Data Semantics, 2:136–175, 2005. (Alphabetical authorship, equal responsibility) doi:10.1007/978-3-540-30567-5_6
    [BibTeX] [Download PDF]
    @article{Cooke:StreamIntegration:JoDS2005,
    author = {Andrew W. Cooke and
    Alasdair J. G. Gray and
    Werner Nutt},
    title = {Stream Integration Techniques for Grid Monitoring},
    journal = {Journal on Data Semantics},
    volume = {2},
    pages = {136--175},
    year = {2005},
    Note = {(Alphabetical authorship, equal responsibility)},
    url = {https://doi.org/10.1007/978-3-540-30567-5\_6},
    doi = {10.1007/978-3-540-30567-5\_6}
    }

  • Alasdair J. G. Gray and Werner Nutt. Republishers in a Publish/Subscribe Architecture for Data Streams. In Database: Enterprise, Skills and Innovation, 22nd British National Conference on Databases, BNCOD 22, Sunderland, UK, July 5-7, 2005, Proceedings, volume 3567 of Lecture Notes in Computer Science, page 179–184. Springer, jul 2005. doi:10.1007/11511854_17
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/bncod/GrayN05,
    author = {Alasdair J. G. Gray and
    Werner Nutt},
    title = {Republishers in a Publish/Subscribe Architecture for Data Streams},
    booktitle = {Database: Enterprise, Skills and Innovation, 22nd British National
    Conference on Databases, {BNCOD} 22, Sunderland, UK, July 5-7, 2005,
    Proceedings},
    series = {Lecture Notes in Computer Science},
    volume = {3567},
    pages = {179--184},
    publisher = {Springer},
    month = jul,
    year = {2005},
    url = {https://doi.org/10.1007/11511854\_17},
    doi = {10.1007/11511854\_17}
    }

  • Rob Byrom, Brian A. Coghlan, Andrew W. Cooke, Roney Cordenonsi, Linda Cornwall, Martin Craig, Abdeslem Djaoui, Alastair Duncan, Steve Fisher, Alasdair J. G. Gray, Steve Hicks, Stuart Kenny, Jason Leake, Oliver Lyttleton, James Magowan, Robin Middleton, Werner Nutt, David O’Callaghan, Norbert Podhorszki, Paul Taylor, John Walk, and Antony J. Wilson. Fault Tolerance in the R-GMA Information and Monitoring System. In Advances in Grid Computing – EGC 2005, European Grid Conference, Amsterdam, The Netherlands, February 14-16, 2005, Revised Selected Papers, volume 3470 of Lecture Notes in Computer Science, page 751–760. Springer, 2005. (Alphabetical authorship) doi:10.1007/11508380_76
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/egc/ByromCCCCCDDFGHKLLMMNOPTWW05,
    author = {Rob Byrom and
    Brian A. Coghlan and
    Andrew W. Cooke and
    Roney Cordenonsi and
    Linda Cornwall and
    Martin Craig and
    Abdeslem Djaoui and
    Alastair Duncan and
    Steve Fisher and
    Alasdair J. G. Gray and
    Steve Hicks and
    Stuart Kenny and
    Jason Leake and
    Oliver Lyttleton and
    James Magowan and
    Robin Middleton and
    Werner Nutt and
    David O'Callaghan and
    Norbert Podhorszki and
    Paul Taylor and
    John Walk and
    Antony J. Wilson},
    title = {Fault Tolerance in the {R-GMA} Information and Monitoring System},
    booktitle = {Advances in Grid Computing - {EGC} 2005, European Grid Conference,
    Amsterdam, The Netherlands, February 14-16, 2005, Revised Selected
    Papers},
    series = {Lecture Notes in Computer Science},
    volume = {3470},
    pages = {751--760},
    publisher = {Springer},
    month = feb,
    year = {2005},
    Note = {(Alphabetical authorship)},
    url = {https://doi.org/10.1007/11508380\_76},
    doi = {10.1007/11508380\_76}
    }

  • Alasdair J. G. Gray and Werner Nutt. A Data Stream Publish/Subscribe Architecture with Self-adapting Queries. In On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE, OTM Confederated International Conferences CoopIS, DOA, and ODBASE 2005, Agia Napa, Cyprus, October 31 – November 4, 2005, Proceedings, Part I, volume 3760 of Lecture Notes in Computer Science, page 420–438. Springer, oct 2005. doi:10.1007/11575771_27
    [BibTeX] [Download PDF]
    @inproceedings{DBLP:conf/otm/GrayN05,
    author = {Alasdair J. G. Gray and
    Werner Nutt},
    title = {A Data Stream Publish/Subscribe Architecture with Self-adapting Queries},
    booktitle = {On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and
    ODBASE, {OTM} Confederated International Conferences CoopIS, DOA,
    and {ODBASE} 2005, Agia Napa, Cyprus, October 31 - November 4, 2005,
    Proceedings, Part {I}},
    series = {Lecture Notes in Computer Science},
    volume = {3760},
    pages = {420--438},
    publisher = {Springer},
    month = oct,
    year = {2005},
    url = {https://doi.org/10.1007/11575771\_27},
    doi = {10.1007/11575771\_27}
    }

2004

  • Andrew W. Cooke, Alasdair J. G. Gray, Werner Nutt, James Magowan, Manfred Oevers, Paul Taylor, Roney Cordenonsi, Rob Byrom, Linda Cornwall, Abdeslem Djaoui, Laurence Field, Steve Fisher, Steve Hicks, Jason Leake, Robin Middleton, Antony J. Wilson, Xiaomei Zhu, Norbert Podhorszki, Brian A. Coghlan, Stuart Kenny, David O’Callaghan, and John Ryan. The Relational Grid Monitoring Architecture: Mediating Information about the Grid. Journal of Grid Computing, 2(4):323–339, 2004. (Alphabetical authorship by site, Heriot-Watt authored paper) doi:10.1007/s10723-005-0151-6
    [BibTeX] [Abstract] [Download PDF]

    We have developed and implemented the Relational Grid Monitoring Architecture (R-GMA) as part of the DataGrid project, to provide a flexible information and monitoring service for use by other middleware components and applications. R-GMA presents users with a virtual database and mediates queries posed at this database: users pose queries against a global schema and R-GMA takes responsibility for locating relevant sources and returning an answer. R-GMA’s architecture and mechanisms are general and can be used wherever there is a need for publishing and querying information in a distributed environment. We discuss the requirements, design and implementation of R-GMA as deployed on the DataGrid testbed. We also describe some of the ways in which R-GMA is being used.

    @article{Cooke:RGMA:JoGC2004,
    abstract = {We have developed and implemented the Relational Grid Monitoring Architecture (R-GMA) as part of the DataGrid project, to provide a flexible information and monitoring service for use by other middleware components and applications.
    R-GMA presents users with a virtual database and mediates queries posed at this database: users pose queries against a global schema and R-GMA takes responsibility for locating relevant sources and returning an answer. R-GMA’s architecture and mechanisms are general and can be used wherever there is a need for publishing and querying information in a distributed environment.
    We discuss the requirements, design and implementation of R-GMA as deployed on the DataGrid testbed. We also describe some of the ways in which R-GMA is being used.},
    author = {Andrew W. Cooke and
    Alasdair J. G. Gray and
    Werner Nutt and
    James Magowan and
    Manfred Oevers and
    Paul Taylor and
    Roney Cordenonsi and
    Rob Byrom and
    Linda Cornwall and
    Abdeslem Djaoui and
    Laurence Field and
    Steve Fisher and
    Steve Hicks and
    Jason Leake and
    Robin Middleton and
    Antony J. Wilson and
    Xiaomei Zhu and
    Norbert Podhorszki and
    Brian A. Coghlan and
    Stuart Kenny and
    David O'Callaghan and
    John Ryan},
    title = {The Relational Grid Monitoring Architecture: Mediating Information
    about the Grid},
    journal = {Journal of Grid Computing},
    volume = {2},
    number = {4},
    pages = {323--339},
    year = {2004},
    Note = {(Alphabetical authorship by site, Heriot-Watt authored paper)},
    url = {https://doi.org/10.1007/s10723-005-0151-6},
    doi = {10.1007/s10723-005-0151-6}
    }

  • A. J. G. Gray and W. Nutt. Answering continuous queries using views. In 5th Postgraduate Research Conference in Electronics, Photonics, Communications and Networks, and Computing Science (PREP 2004), Hatfield, UK, 2004. (Poster paper)
    [BibTeX]
    @inproceedings{Gray2004ACQUV,
    address = {Hatfield, UK},
    author = {A.J.G. Gray and W. Nutt},
    booktitle = {5th Postgraduate Research Conference in Electronics, Photonics, Communications and Networks, and Computing Science (PREP 2004)},
    month = apr,
    note = {(Poster paper)},
    title = {Answering continuous queries using views},
    year = {2004}
    }

  • A. J. G. Gray, A. Cooke, and W. Nutt. Planning continuous selection queries using views. In Workshop on Logic Based Information Agents, volume 04171, Dagstuhl, Germany, apr 2004.
    [BibTeX] [Download PDF]
    @InProceedings{Gray2004Planning-cont,
    author = {A. J. G. Gray and A. Cooke and W. Nutt},
    title = {Planning continuous selection queries using views},
    OPTcrossref = {},
    OPTkey = {},
    booktitle = {Workshop on Logic Based Information Agents},
    OPTpages = {},
    year = {2004},
    OPTeditor = {},
    volume = {04171},
    OPTnumber = {},
    OPTseries = {},
    address = {Dagstuhl, Germany},
    month = apr,
    OPTorganization = {},
    OPTpublisher = {},
    url = {https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=04171},
    OPTnote = {},
    OPTannote = {}
    }

  • Rob Byrom, Brian Coghlan, Andy Cooke, Roney Cordenonsi, Linda Cornwall, Martin Craig, Abdeslem Djaoui, Steve Fisher, Alasdair Gray, Steve Hicks, and others. Production services for information and monitoring in the grid. In AHM2004, Nottingham, UK, 2004.
    [BibTeX]
    @InProceedings{byrom2004production,
    title={Production services for information and monitoring in the grid},
    author={Byrom, Rob and Coghlan, Brian and Cooke, Andy and Cordenonsi, Roney and Cornwall, Linda and Craig, Martin and Djaoui, Abdeslem and Fisher, Steve and Gray, Alasdair and Hicks, Steve and others},
    Booktitle={AHM2004, Nottingham, UK},
    year={2004}
    }

2003

  • Andrew W. Cooke, Alasdair J. G. Gray, Lisha Ma, Werner Nutt, James Magowan, Manfred Oevers, Paul Taylor, Rob Byrom, Laurence Field, Steve Hicks, Jason Leake, Manish Soni, Antony J. Wilson, Roney Cordenonsi, Linda Cornwall, Abdeslem Djaoui, Steve Fisher, Norbert Podhorszki, Brian A. Coghlan, Stuart Kenny, and David O’Callaghan. R-GMA: An Information Integration System for Grid Monitoring. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE – OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003, volume 2888 of Lecture Notes in Computer Science, page 462–481. Springer, nov 2003. doi:10.1007/978-3-540-39964-3_29
    [BibTeX] [Download PDF]
    @inproceedings{Cooke:RGMA:CoopIS2003,
    author = {Andrew W. Cooke and
    Alasdair J. G. Gray and
    Lisha Ma and
    Werner Nutt and
    James Magowan and
    Manfred Oevers and
    Paul Taylor and
    Rob Byrom and
    Laurence Field and
    Steve Hicks and
    Jason Leake and
    Manish Soni and
    Antony J. Wilson and
    Roney Cordenonsi and
    Linda Cornwall and
    Abdeslem Djaoui and
    Steve Fisher and
    Norbert Podhorszki and
    Brian A. Coghlan and
    Stuart Kenny and
    David O'Callaghan},
    title = {{R-GMA:} An Information Integration System for Grid Monitoring},
    booktitle = {On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and
    {ODBASE} - {OTM} Confederated International Conferences, CoopIS, DOA,
    and {ODBASE} 2003, Catania, Sicily, Italy, November 3-7, 2003},
    series = {Lecture Notes in Computer Science},
    volume = {2888},
    pages = {462--481},
    publisher = {Springer},
    month = nov,
    year = {2003},
    url = {https://doi.org/10.1007/978-3-540-39964-3\_29},
    doi = {10.1007/978-3-540-39964-3\_29}
    }