big-data – Heriot-Watt Semantic Web Lab

The BioHackathon Europe Subsetting Project intends to present the achievements and challenges of Wikidata subsetting on the Wikidata Reuse Days. I’m going to be in this presentation too, and I’m going to talk about the role of Wikidata References in subsetting and also about one of the Wikidata third-party tools, called WDumper as a subsetting tool. Join us if you want to know more about subsetting applications, theoretics, tools, and challenges.

This link refers to the details of the event, the speakers, and the topics covered. If you want to know more about the Subsettint Project in BioHackathon Europe, See this post.

Abstract

Often Wikidata is too big to handle. Through a sequence of BioHackathons, we have been reviewing methods to extract subsets from Wikidata to facilitate downstream reuse. We have identified a set of tools and would like to report back on our intermediate results. We will address the different applicable file formats. Natively, Wikidata data is stored in JSON, but it is also available as RDF through for example the Wikidata Query Service. Different subsetting methods use either or both of those formats as input and output. We will also address the way how to define the subset. This can be a JSON file or a Shape Expression.

I was away from September 22 – 28, 2019 for attending the 15th eScience Conference, San Diego, California, USA. It was my first experience to attend the international conference in which I presented my paper. The objective of the eScience Conference is to promote innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle. This conference was also co-located with the Gateways 2019 conference. The two conferences will offer a shared keynote, presentations about mutually interesting topics, and a shared evening reception, as well as opportunities for mingling during the breaks. Conference attendees will have the option to register for one or both conferences, in full or in part. These two audiences share interests, content, and culture. This is an opportunity to attend both events!

This was the great trip for me because I learnt a lot from it by attending the relevant workshop, presentation and keynotes. My presentation was in Research Object workshop 2019. There were total six workshops

First Day (24th, September 2019)

My workshop was on the first day. So, I was very nervous about my presentation. This RO workshop consists of 15 members that includes professors with lot of research experience.

In the Research Object workshop, first presented the Research Objects by Carole Goble that a merging approach to the publication, and exchange of scholarly information on the Web. Research Objects aim to improve reuse and reproducibility by:

Supporting the publication of more than just PDFs, making data, code, and other resources first class citizens of scholarship
Recognizing that there is often a need to publish collections of these resources together as one shareable, cite-able resource.
Enriching these resources and collections with any and all additional information required to make research reusable, and reproducible!

Research objects are not just data, not just collections, but any digital resource that aims to go beyond the PDF for scholarly publishing!

Welcome Research Objects 2019 By Carole Goble

The keynote speaker was the Bertram Ludäscher, who presented: From Research Objects to Reproducible Science Tales. In his presentation, he talked about what we mean by reproducibility, identify tool and thinking gaps, and bridging gaps.

Keynote at Workshop for #ResearchObjects @ludaesch presents: From Research Objects to Reproducible Science Tales pic.twitter.com/TpOBTf1eSn
— Stian Soiland-Reyes #FBPE 🇪🇺🇬🇧🇳🇴🇲🇽 (@soilandreyes) September 24, 2019

Another presentation about RO-Crate, the new advancement in the Research Object presented by Stian Soiland-Reyes. This is increasingly important as researchers now rely heavily on computational analysis, yet they are facing a reproducibility crisis as key components are often not sufficiently tracked, archived or reported. They are developing Research Object Crate (or RO-Crate for short), a lightweight approach to package research data with their structured metadata, based on schema.org annotations in a formalized JSON-LD format that can be used independent of infrastructure to encourage FAIR sharing of reproducible datasets and analytical methods.

After attending the five more presentations, we would go for the lunch in front of the sea view. It was very amazing view while taking the lunch. After finishing the lunch, first presentation was mine. So, I presented work at the Data Quality Issues in Current Nanopublications by performing the data analysis of using existing datasets (DisGeNET, neXtProt, LIDDI, OpenBEL, WikiPathways).

In my presentation, I discussed the data quality issues in the nanopublications while generating the nanopublications. The data quality issues mean that

General lack of provenance and publication information
Misuse of authoring/publishing ontology terms
Lack of domain expertise and database content

So there is the Need for domain best practice guidelines for generating the good quality nanopublications. Our analysis is also available on the GitHub.

Data Quality Issues in Current Nanopublications from imranasifquaidian

Data Quality Issues in Current Nanopublications

Second Day (25th, September 2019)

The Second Day start with the keynote Speaker Randy Olson, He discussed about the framework ABT (AND, But and Therefore) that The ABT Narrative Template is a new tool for organizing the narrative structure of any amount of content. It is at the core of storytelling, logic, reason, argument and the scientific method. How to divide the big sentences into the ABT format and solve the problem. It is the idea of shrinking a narrative thread down to a single sentence using three connector words: and, but, therefore. In this day, lot of other presentations were held about the eScience and Gateway and challenges about these terminologies.

Third Day (26th, September 2019)

The Third Day start with the keynote Speaker Manish Parashar, He discussed about the Exploring the Future of Facilities-based, Driven-Driven Science. In this day, lot of other presentations were held about the eScience and Gateway and challenges about these terminologies.

3rd day in #eScience2019 conference with Keynote: Manish Parashar on "Exploring the Future of Facilities-based, Driven-Driven Science" pic.twitter.com/jfyhJOqTYx
— Imran Asif (@imranasif87) September 26, 2019

After the lunch on the same day, Another keynote Speaker Maryann E Martone, He discussed about the Exploring the Neuroscience as an open, FAIR and citable discipline.

Today’s final session with Keynote: Maryann Martone on "Neuroscience as an open, FAIR and citable discipline" pic.twitter.com/RCMeK4FLiI
— Imran Asif (@imranasif87) September 27, 2019

Fourth Day (27th, September 2019)

The fourth Day (last day of conference) start with the keynote Speaker Dieter Kranzlmüller, He discussed about the Environmental Computing on SuperMUC-NG – A Partnership between Computer and Domain Sciences.

Today is the final session in #eScience2019 conference with Keynote: Dieter Kranzlmüller on "Environmental Computing on SuperMUC-NG – A Partnership between Computer and Domain Sciences" pic.twitter.com/f1xaiLuu4k
— Imran Asif (@imranasif87) September 27, 2019

Overall, I really enjoyed the conference. I got a chance to spend sometime with a bunch of members of the community and it’s exciting to see the continued excitement and the number of new research questions.

Category: big-data

Subsetting KGs: Talk at Wikidata Reuse Days

Trip Report: 15th eScience International Conference 2019

First Day (24th, September 2019)

Second Day (25th, September 2019)

Third Day (26th, September 2019)

Fourth Day (27th, September 2019)