NAR Database Paper

The new year started with a new publication, an article in the 2018 NAR Database issue about the IUPHAR Guide to Pharmacology Database. Simon D. Harding, Joanna L. Sharman, Elena Faccenda, Chris Southan, Adam J. Pawson, Sam Ireland, Alasdair J. G. Gray, Liam Bruce, Stephen P. H. Alexander, Stephen Anderton, Clare Bryant, Anthony P. Davenport, […]

The new year started with a new publication, an article in the 2018 NAR Database issue about the IUPHAR Guide to Pharmacology Database.

  • Simon D. Harding, Joanna L. Sharman, Elena Faccenda, Chris Southan, Adam J. Pawson, Sam Ireland, Alasdair J. G. Gray, Liam Bruce, Stephen P. H. Alexander, Stephen Anderton, Clare Bryant, Anthony P. Davenport, Christian Doerig, Doriano Fabbro, Francesca Levi-Schaffer, Michael Spedding, Jamie A. Davies, and {NC-IUPHAR}. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucleic Acids Research, 46(D1):D1091-D1106, 2018. doi:10.1093/nar/gkx1121
    [BibTeX] [Abstract] [Download PDF]

    The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.

    @Article{Harding2018-GTP,
    abstract = {The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb, www.guidetopharmacology.org) and its precursor IUPHAR-DB, have captured expert-curated interactions between targets and ligands from selected papers in pharmacology and drug discovery since 2003. This resource continues to be developed in conjunction with the International Union of Basic and Clinical Pharmacology (IUPHAR) and the British Pharmacological Society (BPS). As previously described, our unique model of content selection and quality control is based on 96 target-class subcommittees comprising 512 scientists collaborating with in-house curators. This update describes content expansion, new features and interoperability improvements introduced in the 10 releases since August 2015. Our relationship matrix now describes ∼9000 ligands, ∼15 000 binding constants, ∼6000 papers and ∼1700 human proteins. As an important addition, we also introduce our newly funded project for the Guide to IMMUNOPHARMACOLOGY (GtoImmuPdb, www.guidetoimmunopharmacology.org). This has been ‘forked’ from the well-established GtoPdb data model and expanded into new types of data related to the immune system and inflammatory processes. This includes new ligands, targets, pathways, cell types and diseases for which we are recruiting new IUPHAR expert committees. Designed as an immunopharmacological gateway, it also has an emphasis on potential therapeutic interventions.},
    author = {Harding, Simon D and Sharman, Joanna L and Faccenda, Elena and Southan, Chris and Pawson, Adam J and Ireland, Sam and Gray, Alasdair J G and Bruce, Liam and Alexander, Stephen P H and Anderton, Stephen and Bryant, Clare and Davenport, Anthony P and Doerig, Christian and Fabbro, Doriano and Levi-Schaffer, Francesca and Spedding, Michael and Davies, Jamie A and , {NC-IUPHAR}},
    title = {The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY},
    journal = {Nucleic Acids Research},
    year = {2018},
    volume = {46},
    number = {D1},
    pages = {D1091-D1106},
    month = jan,
    OPTnote = {},
    OPTannote = {},
    url = {http://dx.doi.org/10.1093/nar/gkx1121},
    doi = {10.1093/nar/gkx1121}
    }

My involvement came from Liam Bruce’s honours project. Liam developed the RDB2RDF mappings that convert the existing relational content into an RDF representation. The mappings are executed using the Morph-RDB R2RML engine.

To ensure that we abide by the FAIR data principles, we also generate machine processable metadata descriptions of the data that conform to the HCLS Community Profile.

Below is an altmetric donut so you can see what people are saying about the paper.

ISWC2017 Papers

I have had two papers accepted within the events that make up ISWC2017. My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as […]

I have had two papers accepted within the events that make up ISWC2017.

My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as the capsizing of the Eastern Star on the Yangtze River [1].

The second paper is a poster in the main conference that provides an overview of the Bioschemas project where we are identifying the Schema.org markup that is of primary importance for life science resources. Hopefully the paper title will pull the punters in for the session [2].

[1] Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, and Chengxiang Wang. Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams. In Web Stream Processing 2017, 2017.
[Bibtex]
@InProceedings{ZhouEtal2017:LostSilence:WSP2017,
abstract = {Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones' positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.},
author = {Qianru Zhou and Stephen McLaughlin and Alasdair J G Gray and Shangbin Wu and Chengxiang Wang},
title = {Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams},
OPTcrossref = {},
OPTkey = {},
booktitle = {Web Stream Processing 2017},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
OPTmonth = {},
OPTaddress = {},
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {}
}
[2] Alasdair J. G. Gray, Carole Goble, Rafael C. Jimenez, and The Bioschemas Community. Bioschemas: From Potato Salad to Protein Annotation. In ISWC 2017 Poster Proceedings, Vienna, Austria, 2017. Poster
[Bibtex]
@InProceedings{grayetal2017:bioschemas:iswc2017,
abstract = {The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g. food recipes, the Bioschemas community (http://bioschemas.org) aim to make life sciences datasets more findable by encouraging data providers to embed Schema.org markup in their resources.},
author = {Alasdair J G Gray and Carole Goble and Rafael C Jimenez and {The Bioschemas Community}},
title = {Bioschemas: From Potato Salad to Protein Annotation},
OPTcrossref = {},
OPTkey = {},
booktitle = {ISWC 2017 Poster Proceedings},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
OPTmonth = {},
address = {Vienna, Austria},
OPTorganization = {},
OPTpublisher = {},
note = {Poster},
OPTannote = {}
}

ISWC2017 Papers

I have had two papers accepted within the events that make up ISWC2017. My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as […]

I have had two papers accepted within the events that make up ISWC2017.

My PhD student Qianru Zhou has been working on using RDF stream processing to detect anomalous events through telecommunication network messages. The particular scenario in our paper that will be presented at the Web Stream Processing workshop focuses on detecting a disaster such as the capsizing of the Eastern Star on the Yangtze River [1].

The second paper is a poster in the main conference that provides an overview of the Bioschemas project where we are identifying the Schema.org markup that is of primary importance for life science resources. Hopefully the paper title will pull the punters in for the session [2].

[1] Qianru Zhou, Stephen McLaughlin, Alasdair J. G. Gray, Shangbin Wu, and Chengxiang Wang. Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams. In Web Stream Processing 2017, 2017.
[Bibtex]
@InProceedings{ZhouEtal2017:LostSilence:WSP2017,
abstract = {Early detection of significant traumatic events, e.g. terrorist events, ship capsizes, is important to ensure that a prompt emergency response can occur. In the modern world telecommunication systems can and do play a key role in ensuring a successful emergency response by detecting such incidents through significant changes in calls and access to the networks. In this paper a methodology is illustrated to detect such incidents immediately (with the delay in the order of milliseconds), by processing semantically annotated streams of data in cellular telecommunication systems. In our methodology, live information of phones' positions and status are encoded as RDF streams. We propose an algorithm that processes streams of RDF annotated telecommunication data to detect abnormality. Our approach is exemplified in the context of capsize of a passenger cruise ship but is readily translatable to other incidents. Our evaluation results show that with properly chosen window size, such incidents can be detected effectively.},
author = {Qianru Zhou and Stephen McLaughlin and Alasdair J G Gray and Shangbin Wu and Chengxiang Wang},
title = {Lost Silence: An emergency response early detection service through continuous processing of telecommunication data streams},
OPTcrossref = {},
OPTkey = {},
booktitle = {Web Stream Processing 2017},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
OPTmonth = {},
OPTaddress = {},
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {}
}
[2] Alasdair J. G. Gray, Carole Goble, Rafael C. Jimenez, and The Bioschemas Community. Bioschemas: From Potato Salad to Protein Annotation. In ISWC 2017 Poster Proceedings, Vienna, Austria, 2017. Poster
[Bibtex]
@InProceedings{grayetal2017:bioschemas:iswc2017,
abstract = {The life sciences have a wealth of data resources with a wide range of overlapping content. Key repositories, such as UniProt for protein data or Entrez Gene for gene data, are well known and their content easily discovered through search engines. However, there is a long-tail of bespoke datasets with important content that are not so prominent in search results. Building on the success of Schema.org for making a wide range of structured web content more discoverable and interpretable, e.g. food recipes, the Bioschemas community (http://bioschemas.org) aim to make life sciences datasets more findable by encouraging data providers to embed Schema.org markup in their resources.},
author = {Alasdair J G Gray and Carole Goble and Rafael C Jimenez and {The Bioschemas Community}},
title = {Bioschemas: From Potato Salad to Protein Annotation},
OPTcrossref = {},
OPTkey = {},
booktitle = {ISWC 2017 Poster Proceedings},
year = {2017},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
OPTmonth = {},
address = {Vienna, Austria},
OPTorganization = {},
OPTpublisher = {},
note = {Poster},
OPTannote = {}
}

Interoperability and FAIRness through a novel combination of Web technologies

New paper [1] on using Semantic Web technologies to publish existing data according to the FAIR data principles [2]. Abstract: Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein […]

New paper [1] on using Semantic Web technologies to publish existing data according to the FAIR data principles [2].

Abstract: Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved at the level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

[1] [doi] Mark D. Wilkinson, Ruben Verborgh, Luiz Olavo {Bonino da Silva Santos}, Tim Clark, Morris A. Swertz, Fleur D. L. Kelpin, Alasdair J. G. Gray, Erik A. Schultes, Erik M. van Mulligen, Paolo Ciccarese, Arnold Kuzniar, Anand Gavai, Mark Thompson, Rajaram Kaliyaperumal, Jerven T. Bolleman, and Michel Dumontier. Interoperability and FAIRness through a novel combination of Web technologies. PeerJ Computer Science, 3:e110, apr 2017.
[Bibtex]
@article{Wilkinson2017-FAIRness,
abstract = {Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.},
author = {Wilkinson, Mark D. and Verborgh, Ruben and {Bonino da Silva Santos}, Luiz Olavo and Clark, Tim and Swertz, Morris A. and Kelpin, Fleur D.L. and Gray, Alasdair J.G. and Schultes, Erik A. and van Mulligen, Erik M. and Ciccarese, Paolo and Kuzniar, Arnold and Gavai, Anand and Thompson, Mark and Kaliyaperumal, Rajaram and Bolleman, Jerven T. and Dumontier, Michel},
doi = {10.7717/peerj-cs.110},
issn = {2376-5992},
journal = {PeerJ Computer Science},
month = {apr},
pages = {e110},
publisher = {PeerJ Inc.},
title = {{Interoperability and FAIRness through a novel combination of Web technologies}},
url = {https://peerj.com/articles/cs-110},
volume = {3},
year = {2017}
}
[2] [doi] Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino {da Silva Santos}, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A. C. {‘t Hoen}, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3:160018, 2016.
[Bibtex]
@article{Wilkinson2016,
abstract = {There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the {FAIR} Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the {FAIR} Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the {FAIR} Principles, and includes the rationale behind them, and some exemplar implementations in the community.},
author = {Wilkinson, Mark D and Dumontier, Michel and Aalbersberg, IJsbrand Jan and Appleton, Gabrielle and Axton, Myles and Baak, Arie and Blomberg, Niklas and Boiten, Jan-Willem and {da Silva Santos}, Luiz Bonino and Bourne, Philip E and Bouwman, Jildau and Brookes, Anthony J and Clark, Tim and Crosas, Merc{\`{e}} and Dillo, Ingrid and Dumon, Olivier and Edmunds, Scott and Evelo, Chris T and Finkers, Richard and Gonzalez-Beltran, Alejandra and Gray, Alasdair J.G. and Groth, Paul and Goble, Carole and Grethe, Jeffrey S and Heringa, Jaap and {'t Hoen}, Peter A.C and Hooft, Rob and Kuhn, Tobias and Kok, Ruben and Kok, Joost and Lusher, Scott J and Martone, Maryann E and Mons, Albert and Packer, Abel L and Persson, Bengt and Rocca-Serra, Philippe and Roos, Marco and van Schaik, Rene and Sansone, Susanna-Assunta and Schultes, Erik and Sengstag, Thierry and Slater, Ted and Strawn, George and Swertz, Morris A and Thompson, Mark and van der Lei, Johan and van Mulligen, Erik and Velterop, Jan and Waagmeester, Andra and Wittenburg, Peter and Wolstencroft, Katherine and Zhao, Jun and Mons, Barend},
doi = {10.1038/sdata.2016.18},
issn = {2052-4463},
journal = {Scientific Data},
month = mar,
pages = {160018},
publisher = {Macmillan Publishers Limited},
title = {{The FAIR Guiding Principles for scientific data management and stewardship}},
url = {http://www.nature.com/articles/sdata201618},
volume = {3},
year = {2016}
}

New Paper: Reproducibility with Administrative Data

Our journal article [1] looks at encouraging good practice to enable reproducible analysis of data analysis workflows. This is a result of a collaboration between social scientists and a computer scientist with the ADRC-Scotland. Abstract: Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational […]

Our journal article [1] looks at encouraging good practice to enable reproducible analysis of data analysis workflows. This is a result of a collaboration between social scientists and a computer scientist with the ADRC-Scotland.

Abstract: Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.

[1] [doi] C. J. Playford, V. Gayle, R. Connelly, and A. J. Gray, “Administrative social science data: The challenge of reproducible research,” Big Data & Society, vol. 3, iss. 2, 2016.
[Bibtex]
@Article{Playford2016BDS,
abstract = {Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.},
author = {Christopher J Playford and Vernon Gayle and Roxanne Connelly and Alasdair JG Gray},
title = {Administrative social science data: The challenge of reproducible research},
journal = {Big Data \& Society},
year = {2016},
OPTkey = {},
volume = {3},
number = {2},
OPTpages = {},
month = dec,
url = {http://journals.sagepub.com/doi/full/10.1177/2053951716684143},
doi = {10.1177/2053951716684143},
OPTnote = {},
OPTannote = {}
}

New Paper: Reproducibility with Administrative Data

Our journal article [1] looks at encouraging good practice to enable reproducible analysis of data analysis workflows. This is a result of a collaboration between social scientists and a computer scientist with the ADRC-Scotland. Abstract: Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational […]

Our journal article [1] looks at encouraging good practice to enable reproducible analysis of data analysis workflows. This is a result of a collaboration between social scientists and a computer scientist with the ADRC-Scotland.

Abstract: Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.

[1] [doi] C. J. Playford, V. Gayle, R. Connelly, and A. J. Gray, “Administrative social science data: The challenge of reproducible research,” Big Data & Society, vol. 3, iss. 2, 2016.
[Bibtex]
@Article{Playford2016BDS,
abstract = {Powerful new social science data resources are emerging. One particularly important source is administrative data, which were originally collected for organisational purposes but often contain information that is suitable for social science research. In this paper we outline the concept of reproducible research in relation to micro-level administrative social science data. Our central claim is that a planned and organised workflow is essential for high quality research using micro-level administrative social science data. We argue that it is essential for researchers to share research code, because code sharing enables the elements of reproducible research. First, it enables results to be duplicated and therefore allows the accuracy and validity of analyses to be evaluated. Second, it facilitates further tests of the robustness of the original piece of research. Drawing on insights from computer science and other disciplines that have been engaged in e-Research we discuss and advocate the use of Git repositories to provide a useable and effective solution to research code sharing and rendering social science research using micro-level administrative data reproducible.},
author = {Christopher J Playford and Vernon Gayle and Roxanne Connelly and Alasdair JG Gray},
title = {Administrative social science data: The challenge of reproducible research},
journal = {Big Data \& Society},
year = {2016},
OPTkey = {},
volume = {3},
number = {2},
OPTpages = {},
month = dec,
url = {http://journals.sagepub.com/doi/full/10.1177/2053951716684143},
doi = {10.1177/2053951716684143},
OPTnote = {},
OPTannote = {}
}

HCLS Tutorial at SWAT4LS 2016

On 5 December 2016 I presented a tutorial [1] on the Heath Care and Life Sciences Community Profile (HCLS Datasets) at the 9th International Semantic Web Applications and Tools for the Life Sciences Conference (SWAT4LS 2016). Below you can find the slides I presented. The 61 metadata properties from 18 vocabularies reused in the HCLS Community […]

On 5 December 2016 I presented a tutorial [1] on the Heath Care and Life Sciences Community Profile (HCLS Datasets) at the 9th International Semantic Web Applications and Tools for the Life Sciences Conference (SWAT4LS 2016). Below you can find the slides I presented.

The 61 metadata properties from 18 vocabularies reused in the HCLS Community Profile are available in this spreadsheet (.ods).

[1] M. Dumontier, A. J. G. Gray, and S. M. Marshall, “Describing Datasets with the Health Care and Life Sciences Community Profile,” in Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016), Amsterdam, The Netherlands, 2016.
[Bibtex]
@InProceedings{Gray2016SWAT4LSTutorial,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.},
author = {Michel Dumontier and Alasdair J. G. Gray and M. Scott Marshall},
title = {Describing Datasets with the Health Care and Life Sciences Community Profile},
OPTcrossref = {},
OPTkey = {},
booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016)},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = dec,
address = {Amsterdam, The Netherlands},
OPTorganization = {},
OPTpublisher = {},
note = {(Tutorial)},
url = {http://www.swat4ls.org/workshops/amsterdam2016/tutorials/t2/},
OPTannote = {}
}

HCLS Tutorial at SWAT4LS 2016

On 5 December 2016 I presented a tutorial [1] on the Heath Care and Life Sciences Community Profile (HCLS Datasets) at the 9th International Semantic Web Applications and Tools for the Life Sciences Conference (SWAT4LS 2016). Below you can find the slides I presented. The 61 metadata properties from 18 vocabularies reused in the HCLS Community […]

On 5 December 2016 I presented a tutorial [1] on the Heath Care and Life Sciences Community Profile (HCLS Datasets) at the 9th International Semantic Web Applications and Tools for the Life Sciences Conference (SWAT4LS 2016). Below you can find the slides I presented.

The 61 metadata properties from 18 vocabularies reused in the HCLS Community Profile are available in this spreadsheet (.ods).

[1] M. Dumontier, A. J. G. Gray, and S. M. Marshall, “Describing Datasets with the Health Care and Life Sciences Community Profile,” in Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016), Amsterdam, The Netherlands, 2016.
[Bibtex]
@InProceedings{Gray2016SWAT4LSTutorial,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting HCLS community profile covers elements of description, identification, attribution, versioning, provenance, and content summarization. The HCLS community profile reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. The goal of this tutorial is to explain elements of the HCLS community profile and to enable users to craft and validate descriptions for datasets of interest.},
author = {Michel Dumontier and Alasdair J. G. Gray and M. Scott Marshall},
title = {Describing Datasets with the Health Care and Life Sciences Community Profile},
OPTcrossref = {},
OPTkey = {},
booktitle = {Semantic Web Applications and Tools for Life Sciences (SWAT4LS 2016)},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
OPTpages = {},
month = dec,
address = {Amsterdam, The Netherlands},
OPTorganization = {},
OPTpublisher = {},
note = {(Tutorial)},
url = {http://www.swat4ls.org/workshops/amsterdam2016/tutorials/t2/},
OPTannote = {}
}

HCLS Community Profile for Dataset Descriptions

My latest publication [1] describes the process followed in developing the W3C Health Care and Life Sciences Interest Group (HCLSIG) community profile for dataset descriptions which was published last year. The diagram below provides a summary of the data model for describing datasets which covers 61 metadata terms drawn from 18 vocabularies. [1] M. Dumontier, A. […]

My latest publication [1] describes the process followed in developing the W3C Health Care and Life Sciences Interest Group (HCLSIG) community profile for dataset descriptions which was published last year. The diagram below provides a summary of the data model for describing datasets which covers 61 metadata terms drawn from 18 vocabularies.Overview of the HCLS Community Profile for Dataset Descriptions

[1] [doi] M. Dumontier, A. J. G. Gray, S. M. Marshall, V. Alexiev, P. Ansell, G. Bader, J. Baran, J. T. Bolleman, A. Callahan, J. Cruz-Toledo, P. Gaudet, E. A. Gombocz, A. N. Gonzalez-Beltran, P. Groth, M. Haendel, M. Ito, S. Jupp, N. Juty, T. Katayama, N. Kobayashi, K. Krishnaswami, C. Laibe, N. {Le Novère}, S. Lin, J. Malone, M. Miller, C. J. Mungall, L. Rietveld, S. M. Wimalaratne, and A. Yamaguchi, “The health care and life sciences community profile for dataset descriptions,” PeerJ, vol. 4, p. e2331, 2016.
[Bibtex]
@article{Dumontier2016HCLS,
abstract = {Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the {W3C} Semantic Web for Health Care and the Life Sciences Interest Group ({HCLSIG}) identified Resource Description Framework ({RDF}) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of {FAIR} data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.},
author = {Dumontier, Michel and Gray, Alasdair J.G. and Marshall, M Scott and Alexiev, Vladimir and Ansell, Peter and Bader, Gary and Baran, Joachim and Bolleman, Jerven T and Callahan, Alison and Cruz-Toledo, Jos{'{e}} and Gaudet, Pascale and Gombocz, Erich A and Gonzalez-Beltran, Alejandra N. and Groth, Paul and Haendel, Melissa and Ito, Maori and Jupp, Simon and Juty, Nick and Katayama, Toshiaki and Kobayashi, Norio and Krishnaswami, Kalpana and Laibe, Camille and {Le Nov{`{e}}re}, Nicolas and Lin, Simon and Malone, James and Miller, Michael and Mungall, Christopher J and Rietveld, Laurens and Wimalaratne, Sarala M and Yamaguchi, Atsuko},
doi = {10.7717/peerj.2331},
issn = {2167-8359},
journal = {PeerJ},
month = aug,
title = {The health care and life sciences community profile for dataset descriptions},
volume = {4},
pages = {e2331},
year = {2016},
url = {https://peerj.com/articles/2331/}
}