Navigation

AISEC software

Research data sets

Image

Machine Learning Classification of Price Extrema Based on Market Microstructure and Price Action Features. A Case Study of S&P500 E-mini Futures. Reproducibility Package

Publish data: 2021
Type: Computer model/algorithm
The study introduces an automated trading system for S&P500 E-mini futures (ES) based on state-of-the-art machine learning. Concretely: we extract a set of scenarios from the tick market data to train the models and further use the predictions to statistically assess the soundness of the approach. We define the scenarios from the local extrema of the price action. Price extrema is a commonly traded pattern, however, to the best of our knowledge, there is no study presenting a pipeline for automated classification and profitability evaluation. Additionally, we evaluate the approach in the simulated trading environment on the historical data. Our study is filling this gap by presenting a broad evaluation of the approach supported by statistical tools which make it generalisable to unseen data and comparable to other approaches. By Artur Sokolovsky; Luca Arnaboldi
Image

Explainable ML-driven Strategy for Automated Trading Pattern Extraction - Reproducibility Package

Publish data: 2021
Type: Data analysis technique
Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome. In this code repository you will find datasets and a notebook to reproduce experiments from our paper.

Software and Technical Products

Schmitty - an Agda library for SMT integration

Schmitty is an Agda library which gives you bindings to SMT solvers, offering you is a well-typed embedding of some of the SMT-LIB language in Agda. That means as well as shouting “solve” at your problems, you can also write SMT queries yourself! Right now, Schmitty supports three theories—Core, Ints, and Reals—and two backends—Z3, andCVC4. If you’re missing your favourite theory or solver, your contribution is more than welcome!

Amethyst - an Agda library for NN verification

Type: Webtool/Application
Published Year: 2020

Impacts:

Led to creation of the tool Vehicle. Produced by Wen Kokke

Amethyst is an Agda library for neural network verification. Using Agda’s expressive dependent type system, Amethyst is capable of specifying the desired properties of the neural network as a type. If your neural network doesn’t have the right input/output behaviour, it won’t even type-check! Amethyst is superseded by Vehicle in many ways.

Agda - Setup

Type: Webtool/Application
Published Year: 2022

Impacts:

This tool has been used by several high profile Agda libraries for Continuous Integration testing, listed at https://github.com/wenkokke/setup-agda/network/dependents

Set up Agda, the standard library, or any Git-hosted library, for use in GitHub Actions. This action can install Agda from binary distributions or build Agda from source.

Vehicle

Type: Webtool/Application
Published Year: 2022

Impacts:

The software is used as a working tool in collaboration with HUJI and Stanford (researchers developing Marabou)

The application provides an Interface for Neural Network Verifiers with Interactive Theorem Provers. Verification of neural networks is currently a hot topic in automated theorem proving. Progress has been rapid and there are now a wide range of tools available that can verify properties of networks with hundreds of thousands of nodes. In theory this opens the door to the verification of larger control systems that make use of neural network components. However, although work has managed to incorporate the results of these verifiers to prove larger properties of individual systems, there is currently no general methodology for bridging the gap between verifiers and interactive theorem provers (ITPs). In this paper we present Vehicle, our solution to this problem. Vehicle is equipped with an expressive domain specific language for stating neural network specifications which can be compiled to both verifiers and ITPs. It overcomes previous issues with maintainability and scalability in similar ITP formalisations by using a standard ONNX file as the single canonical representation of the network. We demonstrate its utility by using it to connect the neural network verifier Marabou to Agda and then formally verifying that a car steered by a neural network never leaves the road, even in the face of an unpredictable cross wind and imperfect sensors. The network has over 20,000 nodes, and therefore this proof represents an improvement of 3 orders of magnitude over prior proofs about neural network enhanced systems in ITPs.

Neural Network Library in IMANDRA

Type: Webtool/Application
Published Year: 2022

Impacts:

The software is used as part of industrial collaboration, with Industry Imanra (www.imandra.ai)

Neural networks are increasingly relied upon as components of complex safety-critical systems such as autonomous vehicles. There is high demand for tools and methods that embed neural network verification in a larger verification cycle. However, neural network verification is problematic due to a wide range of verification properties of interest, each so far being amenable to verification only in specialised solvers. In this paper we show how Imandra, a functional programming language and a theorem prover originally designed for verification, validation and simulation of FinTech systems can offer a holistic infrastructure for neural network verification. To showcase Imandra’s potential as a single code base, we developed a novel library that formalises convolutional neural networks in Imandra, and covers different important facets of neural network verification.

Research Tools and Methods

BLOOM Large Language Model

Type: Improvements to research infrastructure
Published Year: 2022

Impacts:

First publicly available "foundational model". Widely used and compared in the community. The ambition is to boost academic research and public benefits in competition to privately owned models, e.g. ChatGPT etc,.

Description:

We created BLOOM the first publicly available large language model. This was a year-long collaboration as part of the BigScience workshop with several hundred of international scientists. I co-led one of the working groups. BLOOM stands for BigScience Large Open-science Open-access Multilingual Language Model.

Tutorial on Neural Network Verification with Vehicle

Type: Improvements to research infrastructure
Published Year: 2023

Impacts:

This tutorial has been run live and adapted for different audiences at: FOMLAS'23 the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems co-allocated with CAV'23 in Paris, France; 18 July 2023; VETSS Summer School at the University of Surrey, UK; 23 August 2023; Youtube recording available; ICFP'23 the International Conference on Functional Programming in Seattle, USA; 08 September 2023.

Description:

Machine learning components, such as neural networks, gradually make their way into software; and, when the software is critically safe, the machine learning components must be verifiably safe. This gives rise to the problem of neural network verification. The community has been making rapid progress in developing methods for incorporating logical specifications into neural networks, both in training and verification. However, to truly unlock the ability to verify real-world neural network-enhanced systems we believe the following is necessary: 1. The specification should be written once and should automatically work with training and verification tools. 2. The specification should be written in a manner independent of any particular neural network training/inference platform. 3. The specification should be able to be written as a high-level property over the problem space, rather than a property over the input space (of the neural network). 4. After verification the specification should be exportable to general interactive theorem provers so that its proof can be incorporated into proofs about the larger systems around the neural network. In this tutorial we presented Vehicle, a tool that allows users to do all of this. We provided an introduction to the Vehicle specification language, and then walked attendees through using it to express a variety of famous and not-so-famous specifications. Subsequently we demonstrate how a specification can be compiled down to i) queries for network verifiers, ii) Tensorflow graphs to be used as loss functions during training and iii) cross-compiled to Agda, a main-stream interactive theorem prover. Finally we discussed some of the technical challenges in the implementation as well as some of the outstanding problems.