Webinar: SPaR.txt – a cheap Shallow Parsing approach for Regulatory texts

I’ll be presenting my slides for the NLLP (legal NLP) workshop during EMNLP 2021. We’ve created a text processing tool that can be used as a building block for ACC (Automated Compliance Checking) using semantic parsing.  This work has been done in collaboration with Ioannis Konstas, Alasdair Gray, Farhad Sadeghineko, Richard Watson and Bimal Kumar, and is part of the Intelligent Regulatory Compliance (i-ReC) project, a collaboration between Northumbria University and HWU. You can find our data and code at: https://github.com/rubenkruiper/SPaR.txt

Title: SPaR.txt – a cheap Shallow Parsing approach for Regulatory texts

Summary:
Understanding written instructions is a notoriously hard task for computers. One can imagine that the task becomes harder when more and more instructions are given at once, especially when these instructions can be ambiguous or even conflicting. These reasons, amongst others, are why it is so hard to achieve (semi-)automated regulatory compliance checking — something that has been researched in the fields of Architecture, Engineering, and Construction (AEC) since the early 80s. 

One necessary part of the puzzle is that a computer must recognise entities in a text, e.g., that a `party wall’ is not someone dressed up for halloween but a wall shared by multiple buildings. We’d like a computer to understand such domain-specific terms. But we don’t want to spend a lot of time defining exactly which words and word combinations belong to our domain lexicon. Therefore, we developed a tool that can help us identify groups of words (in the context of a sentence) that are likely a term of interest.

Example of the annotation scheme, see Figure 2 in our preprint.

We show that it is easy to train our tool — a simple annotation task, but also few training examples needed to achieve reasonable results. Even using just 200 annotated sentences the tool achieves 70,3% exact matches and 24,2% partial matches for entities. The output of this tool (currently focused on the AEC domain) can be used to improve Information Retrieval results and help assemble a lexicon of terminology in support of semantic parsing.

Seminar: Data Quality Issues in Current Nanopublications

Speaker: Imran Asif
Date: Wednesday 18 September 2019
Time: 11:15 – 12:15
Venue: CM T.01 EM1.58

Imran will give a practice version of his workshop paper that will be given at Research Objects 2019 (RO2019).

Abstract: Nanopublications are a granular way of publishing scientific claims together with their associated provenance and publication information. More than 10 million nanopublications have been published by a handful of researchers covering a wide range of topics within the life sciences. We were motivated to replicate an existing analysis of these nanopublications, but then went deeper into the structure of the existing nanopublications. In this paper, we analyse the usage of nanopublications by investigating the distribution of triples in each part and discuss the data quality issues raised by this analysis. From this analysis we argue that there is a need for the community to develop a set of community guidelines for the modelling of nanopublications.

Seminar: Utilising Linked Data in the Public Sector

Title: Utilising Linked Data in the Public Sector

Speaker: Angus Addlesee, PhD Student, Heriot-Watt University

Date: 11:15 on 25 March 2019

Location: CM F.17, Heriot-Watt University

Abstract: In this presentation I will explain how Wallscope (a small tech company in Edinburgh) is using linked data in public sector projects.

Bio: Angus has worked at Wallscope for two years in various roles and is now studying his PhD at Heriot-Watt which is part funded by Wallscope.

Wallscope uses Machine Learning and Semantic Technologies to build Knowledge Graphs and Linked Data applications. We are motivated to lower the barriers for accessing knowledge to improve the health, wealth and sustainability of the world we share.

Seminar: Building intelligent systems (that can explain)

Title: Building intelligent systems (that can explain)

Speaker: Ilaria Tiddi, Research Associate in the Knowledge Representation and Reasoning group of the Vrije Universiteit of Amsterdam (NL)

Date: 10:00 on 6 March 2019

Location: EM3.03, Heriot-Watt University

Abstract: Explanations have been subject of study in a variety of fields (e.g. philosophy, psychology and social science), experiencing a new wave of popularity in Artificial Intelligence thanks to the success of machine learning (see DARPA’s eXplainable AI). Yet, the events of recent times have shown that the effectiveness of intelligent systems is still limited due to their inability to explain their decisions to human users, hence losing in understandability and trustworthiness. In this talk, I will give an overview of my research, aiming at developing systems able to automatically generate explanations using external background knowledge. In particular, I will show how such systems can be based on the existing research on explanations, combined with AI techniques and the large-scale knowledge sources available nowadays.

Bio: I am a Research Associate in the Knowledge Representation and Reasoning group of the Vrije Universiteit of Amsterdam (NL).  My research focuses on creating transparent AI systems that generate explanations through a combination of machine learning, semantic technologies, and knowledge from large, heterogeneous knowledge graphs. As part of my research activities, I am member of the CEUR-WS Editorial Board and the Knowledge Capture conference (K-CAP) Steering Committee, while I have organised workshop series (Recoding Black Mirror, Application of Semantic Web Technologies in Robotics, Linked Data 4 Knowledge Discovery) and Summer Schools (the 2015 and 2016 Semantic Web Summer School).

Twitter: @IlaTiddi

Website : https://kmitd.github.io/ilaria/

Seminar: Scan-vs-BIM for monitoring in construction

Title:Scan-vs-BIM for monitoring in construction

Speaker: Frédéric Bosché, Associate Professor in Construction Informatics,
Director of the Institute for Sustainable Building Design (ISBD), and Leader of the CyberBuild Lab. Heriot-Watt University

Date: 11:15 on 4 March 2019

Location: CM F.17, Heriot-Watt University

Abstract: When Laser Scanning and Building Information Modelling (BIM) technologies were emerging, the construction industry showed significant interest in what were to be eventually called “Scan-to-BIM”: the process of using a laser scanned point clouds to develop BIM models of existing assets. However, with the use of BIM for design, another important use of these technologies is what some have called “Scan-vs-BIM”: the comparison of reality capture 3D point clouds (capturing the as-is states of constructions) to BIM models (representing the as-designed states of constructions). Scan-vs-BIM offers significant opportunities for further automation in construction project delivery for example for progress or quality control.

This talk will present the Scan-vs-BIM concept, illustrate its process and benefits. The talk will then expand on the subject of using the output of Scan-vs-BIM processing to enhance dimensional quality control with a view to evolve dimensional quality control from a traditionally point-based measurement process to a surface-based measurement process.

Bio: Frédéric graduated holds a PhD in Civil Engineering, but also worked as a PostDoc in the Computer Vision group of ETH Zurich, Switzerland for 2.5 years. He is currently Associate Professor in the School of the Energy, Geoscience, Infrastructure and Society (EGIS). Frédéric leads the CyberBuild Lab (http://cyberbuild.hw.ac.uk/), and his research covers two main areas:

  1. Processing of reality capture data to enhance asset construction and life cycle management.
  2. Development and use of virtual and mixed reality technology, to support collaborative and engaging design, construction and engineering works, as well as training.

Frédéric has published over 70 peer-reviewed papers in internationally-recognised journal and conferences, and his research has received a few international research and innovation awards, including two CIOB International Research & Innovation awards in 2016, and the IAARC Tucker-Hasegawa Award in 2018 for “distinguished contributions to the field of automation and robotics in construction”. Frédéric is a member of the Executive Committee of the International Association for Automation and Robotics in Construction (IAARC), and he is Associate Editor of Automation in Construction (Elsevier).

Seminar: The Challenges of Automated Ontology Debugging: Experiences and Ideas

Title: The Challenges of Automated Ontology Debugging: Experiences and Ideas

Speaker: Juan Casanova, University of Edinburgh

Date: 11:15 on 18 February 2019

Location: CM F.17, Heriot-Watt University

Abstract: Some of the principal attractive aspects of semantic automated reasoning methods (logic) are, at the same time, what fights against it becoming widely spread and easily usable for the management of large amounts of data coming from multiple sources (ontologies). Ontology debugging is a fundamental subfield to master if automated ontology-based technologies are to be in charge of large data and knowledge management systems.

I am still a PhD student, and relatively new to the field, but during my work on ontology debugging techniques I feel I have come to identify a few fundamental challenges that we need to be aware of, such as the need for additional information, how big the issue of (local) inconsistency in ontologies can be, and the problem of efficiently finding relevant justifications for inferences.

In this talk, I’ll be briefly explaining my work on automated fault detection using meta-ontologies, in the context of which I have identified and battled with these challenges, and I’ll be presenting my opinions on where these challenges are coming from and what could be done to tackle them. It is likely that some of you will disagree with some of my claims or think that they are obvious, and that is precisely why I think this talk should incentivize some useful and interesting discussion.

Seminar: Environmental Health Research in the Era of the ‘Exposome’

Title: Environmental Health Research in the Era of the ‘Exposome’

Speaker: Miranda Loh, Sc.D., Senior Scientist at the Institute of Occupational Medicine

Date: 11:15 on February 2019

Location: CM F.17, Heriot-Watt University

Abstract: In 2015, an estimated 9 million premature deaths were caused by pollution, with air pollution as the leading environmental risk factor. The potential environmental burden of disease could be even larger, as there are still many unknown causes of disease. Much of this uncertainty around the cause of diseases comes from poor description of environmental and occupational exposures in epidemiological studies. Current research into characterising the exposome, the sum total of all exposures through an individual’s lifetime, aims at improving exposure science and our understanding of the relationships between environment and health. There has been great interest in the exposome community in using sensors and smart technologies to further assessment of environmental, behavioural, and health information for individuals. This seminar will explore current interests in the use of technology in exposome research.

Seminar: Using Interactive Visualisations to Analyse the Structure and Treatment of Topics in Learning Materials

Title: Using Interactive Visualisations to Analyse the Structure and Treatment of Topics in Learning Materials

Speaker: Tanya Howden, Heriot-Watt University

Date: 11:30 on 14 May 2018

Location: CM F.17, Heriot-Watt University

Abstract: With the amount of information available online growing, it is becoming more and more difficult to find what you are looking for, particularly when you’re in an area that you have very little background in. For example, if you were learning about neural networks for the first time, the number of responses you get from a simple Google search can be overwhelming – how do you know where to start?! This is only one of the many challenges faced when searching for appropriate learning materials.

In this talk, I will be discussing the motivations behind my research interests before introducing and demonstrating a prototype that has been created with the aim to give learners a more engaging environment with unified organisation and access to different materials on one subject.

Seminar: PhD Progression Talks

A double bill of PhD progression talks (abstracts below):

Venue: 3.07 Earl Mountbatten Building, Heriot-Watt University, Edinburgh

Time and Date: 11:15, 8 May 2017

Evaluating Record Linkage Techniques

Ahmad Alsadeeqi

Many computer algorithms have been developed to automatically link historical records based on a variety of string matching techniques. These generate an assessment of how likely two records are to be the same. However, it remains unclear how to assess the quality of the linkages computed due to the absence of absolute knowledge of the correct linkage of real historical records – the ground truth. The creation of synthetically generated datasets for which the ground truth linkage is known helps with the assessment of linkage algorithms but the data generated is too clean to be representative of historical records.

We are interested in assessing data linkage algorithms under different data quality scenarios, e.g. with errors typically introduced by a transcription process or where books can be nibbled by mice. We are developing a data corrupting model that injects corruptions into datasets based on given corruption methods and probabilities. We have classified different forms of corruptions found in historical records into four types based on the effect scope of the corruption. Those types are character level (e.g. an f is represented as an s – OCR Corruptions), attribute level (e.g. gender swap – male changed to female due to false entry), record level (e.g. missing records due to different reasons like loss of certificate), and group of records level (e.g. coffee spilt over a page, lost parish records in fire). This will give us the ability to evaluate record linkage algorithms over synthetically generated datasets with known ground truth and with data corruptions matching a given profile.

Computer-Aided Biomimetics: Knowledge Extraction

Ruben Kruiper

Biologically inspired design concerns copying ideas from nature to various other domains, e.g. natural computing. Biomimetics is a sub-field of biologically inspired design and focuses specifically on solving technical/engineering problems. Because engineers lack biological knowledge the process of biomimetics is non-trivial and remains adventitious. Therefore, computational tools have been developed that aim to support engineers during a biomimetics process by integrating large amounts of relevant biological knowledge. Existing tools work apply NLP techniques on biological research papers to build dedicated knowledge bases. However, these existing tools impose an engineering view on biological data. I will talk about the support that ‘Computer-Aided Biomimetics’ tools should provide, introducing a theoretical basis for further research on the appropriate computational techniques.

Seminar: Developing a simple RDF graph library

Date: 11:15, 14 November

Venue: F.17. Colin Maclaurin Building, Heriot-Watt University

Title: Developing a simple RDF graph library

Speaker: Rob Stewart, Heriot-Watt University

Abstract: In this talk I shall present the design and the implementation details of a simple Haskell library for working with RDF data. The library supports parsing and pretty printing for the XML/Turtle/NTriple RDF serialisation formats, and graph querying. It has multiple in-memory representations for RDF graphs, exposed as a parameter to the programmer to meet their application specific needs.

The presentation will cover: the API, how the various RDF graph representations are implemented internally, the W3C testsuite that this library uses to ensure W3C RDF spec conformance, and the library’s performance benchmarking suite.