Journal Articles

Here is a list of my journal articles.

2023

  • Danielle Welter, Nick Juty, Philippe Rocca-Serra, Fuqi Xu, David Henderson, Wei Gu, Jolanda Strubel, Robert Giessmann, Ibrahim Emam, Yojana Gadiya, Tooba Abbassi-Daloii, Ebtisam Alharbi, Alasdair Gray, Melanie Courtot, Philip Gribbon, Vassilios Ioannidis, Dorothy Reilly, Nick Lynch, Jan-Willem Boiten, Venkata Satagopam, Carole Goble, Susanna-Assunta Sansone, and Tony Burdett. FAIR in action – a flexible framework to guide FAIRification. Scientific Data, 2023. To appear doi:10.5281/zenodo.7702124
    [BibTeX] [Abstract] [Download PDF]

    The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with a wide range of public-private partnership projects, demonstrating and implementing improvements across all aspects of FAIR, using a variety of datasets, to demonstrate the reproducibility and wide-ranging applicability of this framework for intra-project FAIRification.

    @article{Welter:FAIR-in-action:SDATA2023,
    abstract={The COVID-19 pandemic has highlighted the need for FAIR (Findable, Accessible, Interoperable, and Reusable) data more than any other scientific challenge to date. We developed a flexible, multi-level, domain-agnostic FAIRification framework, providing practical guidance to improve the FAIRness for both existing and future clinical and molecular datasets. We validated the framework in collaboration with a wide range of public-private partnership projects, demonstrating and implementing improvements across all aspects of FAIR, using a variety of datasets, to demonstrate the reproducibility and wide-ranging applicability of this framework for intra-project FAIRification.},
    title={{FAIR} in action - a flexible framework to guide {FAIRification}},
    author={Danielle Welter and Nick Juty and Philippe Rocca-Serra and Fuqi Xu and David Henderson and Wei Gu and Jolanda Strubel and Robert Giessmann and Ibrahim Emam and Yojana Gadiya and Tooba Abbassi-Daloii and Ebtisam Alharbi and Alasdair Gray and Melanie Courtot and Philip Gribbon and Vassilios Ioannidis and Dorothy Reilly and Nick Lynch and Jan-Willem Boiten and Venkata Satagopam and Carole Goble and Susanna-Assunta Sansone and Tony Burdett},
    journal={Scientific Data},
    OPTmonth={},
    OPTvolume={},
    year={2023},
    doi={10.5281/zenodo.7702124},
    url={https://doi.org/10.5281/zenodo.7702124},
    publisher={Nature},
    note={To appear}
    }

2022

  • Nicolas Matentzoglu, James P. Balhoff, Susan M. Bello, Chris Bizon, Matthew Brush, Tiffany J. Callahan, Christopher G. Chute, William D. Duncan, Chris T. Evelo, Davera Gabriel, and others. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database, 2022, 2022. doi:10.1093/database/baac035
    [BibTeX] [Abstract] [Download PDF]

    Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec

    @article{nico:SSSOM:Database2022,
    abstract={Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec.
    Database URL: http://w3id.org/sssom/spec},
    title={A Simple Standard for Sharing Ontological Mappings (SSSOM)},
    author={Matentzoglu, Nicolas and Balhoff, James P and Bello, Susan M and Bizon, Chris and Brush, Matthew and Callahan, Tiffany J and Chute, Christopher G and Duncan, William D and Evelo, Chris T and Gabriel, Davera and others},
    journal={Database},
    month=may,
    volume={2022},
    year={2022},
    doi={10.1093/database/baac035},
    url={https://doi.org/10.1093/database/baac035},
    publisher={Oxford Academic}
    }

  • Isuru Liyanage, Tony Burdett, Bert Droesbeke, Karoly Erdos, Rolando Fernandez, Alasdair Gray, Muhammad Haseeb, Simon Jupp, Flavia Penim, Cyril Pommier, Philippe Rocca-Serra, Mélanie Courtot, and Frederik Coppens. ELIXIR biovalidator for semantic validation of life science metadata. Bioinformatics, 38(11):3141-3142, 2022. btac195 doi:10.1093/bioinformatics/btac195
    [BibTeX] [Abstract] [Download PDF]

    {To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents.Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate}

    @article{courtot:elixir-validator:bioinformatics2022,
    author = {Liyanage, Isuru and Burdett, Tony and Droesbeke, Bert and Erdos, Karoly and Fernandez, Rolando and Gray, Alasdair and Haseeb, Muhammad and Jupp, Simon and Penim, Flavia and Pommier, Cyril and Rocca-Serra, Philippe and Courtot, Mélanie and Coppens, Frederik},
    title = "{ELIXIR biovalidator for semantic validation of life science metadata}",
    journal = {Bioinformatics},
    year = {2022},
    month = apr,
    volume = {38},
    number = {11},
    pages = {3141-3142},
    publisher = {Oxford University Press},
    abstract = "{To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents.Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate}",
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac195},
    url = {https://doi.org/10.1093/bioinformatics/btac195},
    note = {btac195},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac195/43287005/btac195.pdf},
    }

2020

  • Abiodun G. Akinyemi, Ming Sun, and Alasdair J. G. Gray. Data integration for offshore decommissioning waste management. Automation in Construction, 109:103010, 2020. doi:10.1016/j.autcon.2019.103010
    [BibTeX] [Abstract] [Download PDF]

    Offshore decommissioning represents significant business opportunities for oil and gas service companies. However, for owners of offshore assets and regulators, it is a liability because of the associated costs. One way of mitigating decommissioning costs is through the sales and reuse of decommissioned items. To achieve this effectively, reliability assessment of decommissioned items is required. Such an assessment relies on data collected on the various items over the lifecycle of an engineering asset. Considering that offshore platforms have a design life of about 25 years and data management techniques and tools are constantly evolving, data captured about items to be decommissioned will be in varying forms. In addition, considering the many stakeholders involved with a facility over its lifecycle, information representation of the items will have variations. These challenges make data integration difficult. As a result, this research developed a data integration framework that makes use of Semantic Web technologies and ISO 15926 – a standard for process plant data integration – for rapid assessment of decommissioned items. The proposed solution helps in determining the reuse potential of decommissioned items, which can save on cost and benefit the environment.

    @article{Akinyemi:offshore-data-integration:2020,
    abstract = {Offshore decommissioning represents significant business opportunities for oil and gas service companies. However, for owners of offshore assets and regulators, it is a liability because of the associated costs. One way of mitigating decommissioning costs is through the sales and reuse of decommissioned items. To achieve this effectively, reliability assessment of decommissioned items is required. Such an assessment relies on data collected on the various items over the lifecycle of an engineering asset. Considering that offshore platforms have a design life of about 25 years and data management techniques and tools are constantly evolving, data captured about items to be decommissioned will be in varying forms. In addition, considering the many stakeholders involved with a facility over its lifecycle, information representation of the items will have variations. These challenges make data integration difficult. As a result, this research developed a data integration framework that makes use of Semantic Web technologies and ISO 15926 - a standard for process plant data integration - for rapid assessment of decommissioned items. The proposed solution helps in determining the reuse potential of decommissioned items, which can save on cost and benefit the environment.},
    doi = {10.1016/j.autcon.2019.103010},
    url = {https://www.sciencedirect.com/science/article/pii/S0926580518304059},
    year = 2020,
    month = nov,
    publisher = {Elsevier},
    volume = {109},
    pages = {103010},
    author = {Abiodun G. Akinyemi and Ming Sun and Alasdair J. G. Gray},
    title = {Data integration for offshore decommissioning waste management},
    journal = {Automation in Construction}
    }

2018

  • Abiodun Akinyemi, Ming Sun, and Alasdair J. G. Gray. An ontology-based data integration framework for construction information management. Proceedings of the Institution of Civil Engineers – Management, Procurement and Law, 171(3):111–125, 2018. doi:10.1680/jmapl.17.00052
    [BibTeX] [Abstract] [Download PDF]

    Information management during the construction phase of a built asset involves multiple stakeholders using multiple software applications to generate and store data. This is problematic as data come in different forms and are labour intensive to piece together. Existing solutions to this problem are predominantly in proprietary applications, which are sometimes cost prohibitive for small engineering firms, or conceptual studies with use cases that cannot be easily adapted. In view of these limitations, this research presents an ontology-based data integration framework that makes use of open-source tools that support Semantic Web technologies. The proposed framework enables rapid answering of queries over construction data integrated from heterogeneous sources, data quality checks and reuse of project software resources. The attributes and functionalities of the proposed solution align with the requirements common to small firms with limited information technology skill and budget. Consequently, this solution can be of great benefit for their data projects.

    @article{Akinyemi_2018,
    abstract = {Information management during the construction phase of a built asset involves multiple stakeholders using multiple software applications to generate and store data. This is problematic as data come in different forms and are labour intensive to piece together. Existing solutions to this problem are predominantly in proprietary applications, which are sometimes cost prohibitive for small engineering firms, or conceptual studies with use cases that cannot be easily adapted. In view of these limitations, this research presents an ontology-based data integration framework that makes use of open-source tools that support Semantic Web technologies. The proposed framework enables rapid answering of queries over construction data integrated from heterogeneous sources, data quality checks and reuse of project software resources. The attributes and functionalities of the proposed solution align with the requirements common to small firms with limited information technology skill and budget. Consequently, this solution can be of great benefit for their data projects.},
    doi = {10.1680/jmapl.17.00052},
    url = {https://doi.org/10.1680%2Fjmapl.17.00052},
    year = 2018,
    month = jun,
    publisher = {Thomas Telford Ltd.},
    volume = {171},
    number = {3},
    pages = {111--125},
    author = {Abiodun Akinyemi and Ming Sun and Alasdair J G Gray},
    title = {An ontology-based data integration framework for construction information management},
    journal = {Proceedings of the Institution of Civil Engineers - Management, Procurement and Law}
    }

  • Simon D. Harding, Joanna L. Sharman, Elena Faccenda, Christopher Southan, Adam J. Pawson, Sam Ireland, Alasdair J. G. Gray, Liam Bruce, Stephen P. H. Alexander, Stephen Anderton, Clare Bryant, Anthony P. Davenport, Christian Doerig, Doriano Fabbro, Francesca Levi -, Michael Spedding, Jamie A. Davies, and Nc -. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Research, 46(Database-Issue):D1091–D1106, 2018. doi:10.1093/nar/gkx1121
    [BibTeX] [Abstract] [Download PDF]

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.

    @article{Harding:GtoPdb:NAR2018,
    abstract = {The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.},
    author = {Simon D. Harding and
    Joanna L. Sharman and
    Elena Faccenda and
    Christopher Southan and
    Adam J. Pawson and
    Sam Ireland and
    Alasdair J. G. Gray and
    Liam Bruce and
    Stephen P. H. Alexander and
    Stephen Anderton and
    Clare Bryant and
    Anthony P. Davenport and
    Christian Doerig and
    Doriano Fabbro and
    Francesca Levi{-}Schaffer and
    Michael Spedding and
    Jamie A. Davies and
    Nc{-}Iuphar},
    title = {The {IUPHAR/BPS} Guide to {PHARMACOLOGY} in 2018: updates and expansion
    to encompass the new guide to {IMMUNOPHARMACOLOGY}},
    journal = {Nucleic Acids Research},
    volume = {46},
    number = {Database-Issue},
    pages = {D1091--D1106},
    year = {2018},
    url = {https://doi.org/10.1093/nar/gkx1121},
    doi = {10.1093/nar/gkx1121}
    }

2017

  • Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo Bonino Silva da Santos, Tim Clark, Morris A. Swertz, Fleur D. L. Kelpin, Alasdair J. G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, and Michel Dumontier. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Computer Science, 3:e110, 2017. doi:10.7717/peerj-cs.110
    [BibTeX] [Abstract] [Download PDF]

    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

    @article{Wilkinson:InteropFAIRWebTech:PeerJ2017,
    author = {Mark D. Wilkinson and
    Ruben Verborgh and
    Luiz Olavo Bonino da Silva Santos and
    Tim Clark and
    Morris A. Swertz and
    Fleur D. L. Kelpin and
    Alasdair J. G. Gray and
    Erik A. Schultes and
    Erik M. van Mulligen and
    Paolo Ciccarese and
    Arnold Kuzniar and
    Anand Gavai and
    Mark Thompson and
    Rajaram Kaliyaperumal and
    Jerven T. Bolleman and
    Michel Dumontier},
    title = {Interoperability and FAIRness through a novel combination of Web technologies},
    journal = {PeerJ Computer Science},
    volume = {3},
    pages = {e110},
    year = {2017},
    url = {https://doi.org/10.7717/peerj-cs.110},
    doi = {10.7717/peerj-cs.110},
    abstract = {Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.}
    }

2016

  • Michel Dumontier, Alasdair JG Gray, Scott M. Marshall, Vladimir Alexiev, Peter Ansell, Gary Bader, Joachim Baran, Jerven T. Bolleman, Alison Callahan, José Cruz-Toledo, and others. The health care and life sciences community profile for dataset descriptions. PeerJ, 4:e2331, 2016. doi:10.7717/peerj.2331
    [BibTeX] [Abstract] [Download PDF]

    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

    @article{Dumontier:HCLS-datadesc:PeerJ2016,
    abstract={Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.},
    title={The health care and life sciences community profile for dataset descriptions},
    author={Dumontier, Michel and Gray, Alasdair JG and Marshall, M Scott and Alexiev, Vladimir and Ansell, Peter and Bader, Gary and Baran, Joachim and Bolleman, Jerven T and Callahan, Alison and Cruz-Toledo, Jos{\'e} and others},
    journal={PeerJ},
    volume={4},
    pages={e2331},
    year={2016},
    month=aug,
    url={https://doi.org/10.7717/peerj.2331},
    doi={10.7717/peerj.2331},
    publisher={PeerJ Inc.}
    }

  • Christopher J. Playford, Vernon Gayle, Roxanne Connelly, and Alasdair JG Gray. Administrative social science data: The challenge of reproducible research. Big Data and Society, 3(2):2053951716684143, 2016. doi:10.1177/2053951716684143
    [BibTeX] [Abstract] [Download PDF]

    Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.

    @article{Playford:AdminSocSciRep:BDS2016,
    author = {Christopher J Playford and Vernon Gayle and Roxanne Connelly and Alasdair JG Gray},
    title ={Administrative social science data: The challenge of reproducible research},
    journal = {Big Data and Society},
    volume = {3},
    number = {2},
    pages = {2053951716684143},
    year = {2016},
    doi = {10.1177/2053951716684143},
    URL = {https://doi.org/10.1177/2053951716684143},
    eprint = {https://doi.org/10.1177/2053951716684143},
    abstract = {Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.}
    }

  • Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, and others. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3(1):1–9, 2016. doi:10.1038/sdata.2016.18
    [BibTeX] [Abstract]

    There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

    @article{Wilkinson:FAIRPrinciples:SciData2016,
    title={The FAIR Guiding Principles for scientific data management and stewardship},
    author={Wilkinson, Mark D and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and da Silva Santos, Luiz Bonino and Bourne, Philip E and others},
    journal={Scientific data},
    volume={3},
    number={1},
    pages={1--9},
    year={2016},
    publisher={Nature Publishing Group},
    doi={10.1038/sdata.2016.18},
    abstract={There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.}
    }

2014

  • Alasdair J. G. Gray. Dataset Descriptions for Linked Data Systems. IEEE Internet Computing, 18(4):66–69, 2014. doi:10.1109/MIC.2014.66
    [BibTeX] [Download PDF]
    @article{DBLP:journals/internet/Gray14,
    author = {Alasdair J. G. Gray},
    title = {Dataset Descriptions for Linked Data Systems},
    journal = {{IEEE} Internet Computing},
    volume = {18},
    number = {4},
    pages = {66--69},
    year = {2014},
    url = {https://doi.org/10.1109/MIC.2014.66},
    doi = {10.1109/MIC.2014.66}
    }

  • Alasdair J. G. Gray, Paul T. Groth, Antonis Loizou, Sune Askjaer, Christian Y. A. Brenninkmeijer, Kees Burger, Christine Chichester, Chris T. A. Evelo, Carole A. Goble, Lee Harland, Steve Pettifer, Mark Thompson, Andra Waagmeester, and Antony J. Williams. Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web Journal, 5(2):101–113, 2014. doi:10.3233/SW-2012-0088
    [BibTeX] [Abstract] [Download PDF]

    The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.

    @article{Gray:OPS:SWJ,
    abstract = {The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.},
    author = {Alasdair J. G. Gray and
    Paul T. Groth and
    Antonis Loizou and
    Sune Askjaer and
    Christian Y. A. Brenninkmeijer and
    Kees Burger and
    Christine Chichester and
    Chris T. A. Evelo and
    Carole A. Goble and
    Lee Harland and
    Steve Pettifer and
    Mark Thompson and
    Andra Waagmeester and
    Antony J. Williams},
    title = {Applying linked data approaches to pharmacology: Architectural decisions
    and implementation},
    journal = {Semantic Web Journal},
    volume = {5},
    number = {2},
    pages = {101--113},
    year = {2014},
    url = {https://doi.org/10.3233/SW-2012-0088},
    doi = {10.3233/SW-2012-0088}
    }

  • Paul T. Groth, Antonis Loizou, Alasdair J. G. Gray, Carole A. Goble, Lee Harland, and Steve Pettifer. API-centric Linked Data integration: The Open PHACTS Discovery Platform case study. Journal of Web Semantics, 29:12–18, 2014. doi:10.1016/j.websem.2014.03.003
    [BibTeX] [Download PDF]
    @article{DBLP:journals/ws/GrothLGGHP14,
    author = {Paul T. Groth and
    Antonis Loizou and
    Alasdair J. G. Gray and
    Carole A. Goble and
    Lee Harland and
    Steve Pettifer},
    title = {API-centric Linked Data integration: The Open {PHACTS} Discovery Platform
    case study},
    journal = {Journal of Web Semantics},
    volume = {29},
    pages = {12--18},
    year = {2014},
    url = {https://doi.org/10.1016/j.websem.2014.03.003},
    doi = {10.1016/j.websem.2014.03.003}
    }

2013

  • Paolo Ciccarese, Stian Soiland -, Khalid Belhajjame, Alasdair J. G. Gray, Carole A. Goble, and Tim Clark. PAV ontology: provenance, authoring and versioning. Journal of Biomedical Semantics, 4:37, 2013. doi:10.1186/2041-1480-4-37
    [BibTeX] [Abstract] [Download PDF]

    Background Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. Results We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing “just enough” descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability. Method The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. Discussion We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.

    @article{Ciccarese:PAV:JoBS2013,
    abstract = {Background
    Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as Dublin Core Terms (DC Terms) and the W3C Provenance Ontology (PROV-O) are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. In particular, to track authoring and versioning information of web resources, PROV-O provides a basic methodology but not any specific classes and properties for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator.
    Results
    We present the Provenance, Authoring and Versioning ontology (PAV, namespace http://purl.org/pav/): a lightweight ontology for capturing “just enough” descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the W3C PROV-O ontology to support broader interoperability.
    Method
    The initial design of the PAV ontology was driven by requirements from the AlzSWAN project with further requirements incorporated later from other projects detailed in this paper. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible.
    Discussion
    We analyze and compare PAV with related approaches, namely Provenance Vocabulary (PRV), DC Terms and BIBFRAME. We identify similarities and analyze differences between those vocabularies and PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms. We conclude the paper with general remarks on the applicability of PAV.},
    author = {Paolo Ciccarese and
    Stian Soiland{-}Reyes and
    Khalid Belhajjame and
    Alasdair J. G. Gray and
    Carole A. Goble and
    Tim Clark},
    title = {{PAV} ontology: provenance, authoring and versioning},
    journal = {Journal of Biomedical Semantics},
    volume = {4},
    pages = {37},
    year = {2013},
    url = {https://doi.org/10.1186/2041-1480-4-37},
    doi = {10.1186/2041-1480-4-37}
    }

  • Patrick Jackman, Alasdair J. G. Gray, Andrew Brass, Robert Stevens, Ming Shi, Derek Scuffell, Simon Hammersley, and Bruce Grieve. Processing online crop disease warning information via sensor networks using ISA ontologies. Agricultural Engineering International: CIGR Journal, 15(3):243–251, 2013.
    [BibTeX] [Abstract]

    Growing demand for food is driving the need for higher crop yields globally. Correctly anticipating the onset of damaging crop diseases is essential to achieve this goal. Considerable efforts have been made recently to develop early warning systems. However, these methods lack a direct and online measurement of the spores that attack crops. A novel disease information network has been implemented and deployed. Spore sensors have been developed and deployed. The measurements from these sensors are combined with similar measurements of important local weather readings to generate estimates of crop disease risk. It is combined with other crop disease information allowing overall local disease risk assessments and forecasts to be made. The resulting data is published through a SPARQL endpoint to support reuse and connection into the linked data cloud.

    @article{Jackman:2013Processing-Online-Crop-Disease,
    title = "Processing online crop disease warning information via sensor networks using ISA ontologies",
    abstract = "Growing demand for food is driving the need for higher crop yields globally. Correctly anticipating the onset of damaging crop diseases is essential to achieve this goal. Considerable efforts have been made recently to develop early warning systems. However, these methods lack a direct and online measurement of the spores that attack crops. A novel disease information network has been implemented and deployed. Spore sensors have been developed and deployed. The measurements from these sensors are combined with similar measurements of important local weather readings to generate estimates of crop disease risk. It is combined with other crop disease information allowing overall local disease risk assessments and forecasts to be made. The resulting data is published through a SPARQL endpoint to support reuse and connection into the linked data cloud.",
    keywords = "Crop disease assessment, Data queries, Investigation study assay, Online sensors, Sensor network, Web semantics",
    author = "Patrick Jackman and Alasdair J G Gray and Andrew Brass and Robert Stevens and Ming Shi and Derek Scuffell and Simon Hammersley and Bruce Grieve",
    year = "2013",
    language = "English",
    volume = "15",
    pages = "243--251",
    journal = "Agricultural Engineering International: CIGR Journal",
    issn = "1682-1130",
    publisher = "International Commission of Agricultural and Biosystems Engineering",
    number = "3",
    }

2011

  • Ixent Galpin, Christian Y. A. Brenninkmeijer, Alasdair J. G. Gray, Farhana Jabeen, Alvaro A. A. Fernandes, and Norman W. Paton. SNEE: a query processor for wireless sensor networks. Distributed and Parallel Databases, 29(1-2):31–85, 2011. doi:10.1007/s10619-010-7074-3
    [BibTeX] [Abstract] [Download PDF]

    A wireless sensor network (WSN) can be construed as an intelligent, large-scale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declaratively-expressed queries), then one can achieve a significant reduction in the cost of engineering the software that implements a data collection program for the WSN while still achieving, through query optimization, very favorable cost:benefit ratios. This paper describes a query processing framework for WSNs that meets many desiderata associated with the view of WSN as databases. The framework is presented in the form of compiler/optimizer, called SNEE, for a continuous declarative query language over sensed data streams, called SNEEql. SNEEql can be shown to meet the expressiveness requirements of a large class of applications. SNEE can be shown to generate effective and efficient query evaluation plans. More specifically, the paper describes the following contributions: (1) a user-level syntax and physical algebra for SNEEql, an expressive continuous query language over WSNs; (2) example concrete algorithms for physical algebraic operators defined in such a way that the task of deriving memory, time and energy analytical cost-estimation models (CEMs) for them becomes straightforward by reduction to a structural traversal of the pseudocode; (3) CEMs for the concrete algorithms alluded to; (4) an architecture for the optimization of SNEEql queries, called SNEE, building on well-established distributed query processing components where possible, but making enhancements or refinements where necessary to accommodate the WSN context; (5) algorithms that instantiate the components in the SNEE architecture, thereby supporting integrated query planning that includes routing, placement and timing; and (6) an empirical performance evaluation of the resulting framework.

    @article{DBLP:journals/dpd/GalpinBGJFP11,
    abstract = {A wireless sensor network (WSN) can be construed as an intelligent, large-scale device for observing and measuring properties of the physical world. In recent years, the database research community has championed the view that if we construe a WSN as a database (i.e., if a significant aspect of its intelligent behavior is that it can execute declaratively-expressed queries), then one can achieve a significant reduction in the cost of engineering the software that implements a data collection program for the WSN while still achieving, through query optimization, very favorable cost:benefit ratios. This paper describes a query processing framework for WSNs that meets many desiderata associated with the view of WSN as databases. The framework is presented in the form of compiler/optimizer, called SNEE, for a continuous declarative query language over sensed data streams, called SNEEql. SNEEql can be shown to meet the expressiveness requirements of a large class of applications. SNEE can be shown to generate effective and efficient query evaluation plans. More specifically, the paper describes the following contributions: (1) a user-level syntax and physical algebra for SNEEql, an expressive continuous query language over WSNs; (2) example concrete algorithms for physical algebraic operators defined in such a way that the task of deriving memory, time and energy analytical cost-estimation models (CEMs) for them becomes straightforward by reduction to a structural traversal of the pseudocode; (3) CEMs for the concrete algorithms alluded to; (4) an architecture for the optimization of SNEEql queries, called SNEE, building on well-established distributed query processing components where possible, but making enhancements or refinements where necessary to accommodate the WSN context; (5) algorithms that instantiate the components in the SNEE architecture, thereby supporting integrated query planning that includes routing, placement and timing; and (6) an empirical performance evaluation of the resulting framework.},
    author = {Ixent Galpin and
    Christian Y. A. Brenninkmeijer and
    Alasdair J. G. Gray and
    Farhana Jabeen and
    Alvaro A. A. Fernandes and
    Norman W. Paton},
    title = {{SNEE:} a query processor for wireless sensor networks},
    journal = {Distributed and Parallel Databases},
    volume = {29},
    number = {1-2},
    pages = {31--85},
    year = {2011},
    url = {https://doi.org/10.1007/s10619-010-7074-3},
    doi = {10.1007/s10619-010-7074-3}
    }

  • Alasdair J. G. Gray, Jason Sadler, Oles Kit, Kostis Kyzirakos, Manos Karpathiotakis, Jean-Paul Calbimonte, Kevin R. Page, Raúl García -, Alex Frazer, Ixent Galpin, Alvaro A. A. Fernandes, Norman W. Paton, Óscar Corcho, Manolis Koubarakis, David De Roure, Kirk Martinez, and Asunción Gómez -. A Semantic Sensor Web for Environmental Decision Support Applications. Sensors, 11(9):8855–8887, 2011. doi:10.3390/s110908855
    [BibTeX] [Download PDF]
    @article{DBLP:journals/sensors/GraySKKKCPGFGFP11,
    author = {Alasdair J. G. Gray and
    Jason Sadler and
    Oles Kit and
    Kostis Kyzirakos and
    Manos Karpathiotakis and
    Jean-Paul Calbimonte and
    Kevin R. Page and
    Ra{\'{u}}l Garc{\'{\i}}a{-}Castro and
    Alex Frazer and
    Ixent Galpin and
    Alvaro A. A. Fernandes and
    Norman W. Paton and
    {\'{O}}scar Corcho and
    Manolis Koubarakis and
    David De Roure and
    Kirk Martinez and
    Asunci{\'{o}}n G{\'{o}}mez{-}P{\'{e}}rez},
    title = {A Semantic Sensor Web for Environmental Decision Support Applications},
    journal = {Sensors},
    volume = {11},
    number = {9},
    pages = {8855--8887},
    year = {2011},
    url = {https://doi.org/10.3390/s110908855},
    doi = {10.3390/s110908855}
    }

2010

  • Alasdair J. G. Gray, Norman Gray, Christopher W. Hall, and Iadh Ounis. Finding the right term: Retrieving and exploring semantic concepts in astronomical vocabularies. Information Processing and Management, 46(4):470–478, 2010. (Alphabetic authorship) doi:10.1016/j.ipm.2009.09.004
    [BibTeX] [Abstract] [Download PDF]

    Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept. In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.

    @article{DBLP:journals/ipm/GrayGHO10,
    abstract = {Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept.
    In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.},
    author = {Alasdair J. G. Gray and
    Norman Gray and
    Christopher W. Hall and
    Iadh Ounis},
    title = {Finding the right term: Retrieving and exploring semantic concepts
    in astronomical vocabularies},
    journal = {Information Processing and Management},
    volume = {46},
    number = {4},
    pages = {470--478},
    year = {2010},
    Note = {(Alphabetic authorship)},
    url = {https://doi.org/10.1016/j.ipm.2009.09.004},
    doi = {10.1016/j.ipm.2009.09.004}
    }

2007

  • Alasdair J. G. Gray, Werner Nutt, and Howard M. Williams. Answering queries over incomplete data stream histories. International Journal of Web Information Systems (IJWIS), 3(1/2):41–60, 2007. doi:10.1108/17440080710829216
    [BibTeX] [Abstract] [Download PDF]

    Purpose Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user. Design/methodology/approach A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered. Findings It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness. Research limitations/implications The techniques and models proposed in this paper have only been partially implemented. Practical implications The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data. Originality/value This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.

    @article{Gray:AnsQIncompleteStream:IJWIS2007,
    abstract = {Purpose
    Distributed data streams are an important topic of current research. In such a setting, data values will be missed, e.g. due to network errors. This paper aims to allow this incompleteness to be detected and overcome with either the user not being affected or the effects of the incompleteness being reported to the user.
    Design/methodology/approach
    A model for representing the incomplete information has been developed that captures the information that is known about the missing data. Techniques for query answering involving certain and possible answer sets have been extended so that queries over incomplete data stream histories can be answered.
    Findings
    It is possible to detect when a distributed data stream is missing one or more values. When such data values are missing there will be some information that is known about the data and this is stored in an appropriate format. Even when the available data are incomplete, it is possible in some circumstances to answer a query completely. When this is not possible, additional meta‐data can be returned to inform the user of the effects of the incompleteness.
    Research limitations/implications
    The techniques and models proposed in this paper have only been partially implemented.
    Practical implications
    The proposed system is general and can be applied wherever there is a need to query the history of distributed data streams. The work in this paper enables the system to answer queries when there are missing values in the data.
    Originality/value
    This paper presents a general model of how to detect, represent, and answer historical queries over incomplete distributed data streams.},
    author = {Alasdair J. G. Gray and
    Werner Nutt and
    M. Howard Williams},
    title = {Answering queries over incomplete data stream histories},
    journal = {International Journal of Web Information Systems ({IJWIS})},
    volume = {3},
    number = {1/2},
    pages = {41--60},
    year = {2007},
    url = {https://doi.org/10.1108/17440080710829216},
    doi = {10.1108/17440080710829216}
    }

2005

  • Andrew W. Cooke, Alasdair J. G. Gray, and Werner Nutt. Stream Integration Techniques for Grid Monitoring. Journal on Data Semantics, 2:136–175, 2005. (Alphabetical authorship, equal responsibility) doi:10.1007/978-3-540-30567-5_6
    [BibTeX] [Download PDF]
    @article{Cooke:StreamIntegration:JoDS2005,
    author = {Andrew W. Cooke and
    Alasdair J. G. Gray and
    Werner Nutt},
    title = {Stream Integration Techniques for Grid Monitoring},
    journal = {Journal on Data Semantics},
    volume = {2},
    pages = {136--175},
    year = {2005},
    Note = {(Alphabetical authorship, equal responsibility)},
    url = {https://doi.org/10.1007/978-3-540-30567-5\_6},
    doi = {10.1007/978-3-540-30567-5\_6}
    }

2004

  • Andrew W. Cooke, Alasdair J. G. Gray, Werner Nutt, James Magowan, Manfred Oevers, Paul Taylor, Roney Cordenonsi, Rob Byrom, Linda Cornwall, Abdeslem Djaoui, Laurence Field, Steve Fisher, Steve Hicks, Jason Leake, Robin Middleton, Antony J. Wilson, Xiaomei Zhu, Norbert Podhorszki, Brian A. Coghlan, Stuart Kenny, David O’Callaghan, and John Ryan. The Relational Grid Monitoring Architecture: Mediating Information about the Grid. Journal of Grid Computing, 2(4):323–339, 2004. (Alphabetical authorship by site, Heriot-Watt authored paper) doi:10.1007/s10723-005-0151-6
    [BibTeX] [Abstract] [Download PDF]

    We have developed and implemented the Relational Grid Monitoring Architecture (R-GMA) as part of the DataGrid project, to provide a flexible information and monitoring service for use by other middleware components and applications. R-GMA presents users with a virtual database and mediates queries posed at this database: users pose queries against a global schema and R-GMA takes responsibility for locating relevant sources and returning an answer. R-GMA’s architecture and mechanisms are general and can be used wherever there is a need for publishing and querying information in a distributed environment. We discuss the requirements, design and implementation of R-GMA as deployed on the DataGrid testbed. We also describe some of the ways in which R-GMA is being used.

    @article{Cooke:RGMA:JoGC2004,
    abstract = {We have developed and implemented the Relational Grid Monitoring Architecture (R-GMA) as part of the DataGrid project, to provide a flexible information and monitoring service for use by other middleware components and applications.
    R-GMA presents users with a virtual database and mediates queries posed at this database: users pose queries against a global schema and R-GMA takes responsibility for locating relevant sources and returning an answer. R-GMA’s architecture and mechanisms are general and can be used wherever there is a need for publishing and querying information in a distributed environment.
    We discuss the requirements, design and implementation of R-GMA as deployed on the DataGrid testbed. We also describe some of the ways in which R-GMA is being used.},
    author = {Andrew W. Cooke and
    Alasdair J. G. Gray and
    Werner Nutt and
    James Magowan and
    Manfred Oevers and
    Paul Taylor and
    Roney Cordenonsi and
    Rob Byrom and
    Linda Cornwall and
    Abdeslem Djaoui and
    Laurence Field and
    Steve Fisher and
    Steve Hicks and
    Jason Leake and
    Robin Middleton and
    Antony J. Wilson and
    Xiaomei Zhu and
    Norbert Podhorszki and
    Brian A. Coghlan and
    Stuart Kenny and
    David O'Callaghan and
    John Ryan},
    title = {The Relational Grid Monitoring Architecture: Mediating Information
    about the Grid},
    journal = {Journal of Grid Computing},
    volume = {2},
    number = {4},
    pages = {323--339},
    year = {2004},
    Note = {(Alphabetical authorship by site, Heriot-Watt authored paper)},
    url = {https://doi.org/10.1007/s10723-005-0151-6},
    doi = {10.1007/s10723-005-0151-6}
    }