High-level Programming for Computational Grids

Dependable
Systems
Group

Parallel and Distributed Functional Languages Research Group

Project

High Level Programming for Computational Grids

This project is a 2-year British Council/DAAD funded travel grant (Project No. 1223) partners at LMU Munich, Phillips-Universitaet Marburg, and St Andrews University. We aim to evaluate a single large program on a computational grid, i.e. on a collection of grid- enabled workstation clusters. This entails developing a sophisticated language implementation that adapts dynamically to such a hierarchical, heterogeneous and high-latency architecture.

(06/2003 - 06/2005)

Abstract

Special purpose High Performance Computers (HPCs) are expensive and rare, but workstation clusters are cheap and becoming common. Emerging GRID technology offers the opportunity to integrate GRID-enabled clusters into a single HPC. The acceptance of GRIDs, however, is seriously hampered by the lack of security guarantees and the difficulty of efficiently managing the parallelism in such a heterogeneous clusters, with characteristics radically different from a conventional HPC: it is shared with components used simultaneously by multiple users; it is heterogeneous connecting clusters of different sizes and speeds; it is hierarchical with communication much faster within the cluster than between clusters; it has high communication costs(latency); the effective speeds of components may vary during program execution, due to sharing nodes and connections with other users.

To program this complex and dynamic architecture effectively we propose to use a language with high-level constructs supported by a sophisticated runtime environment (RTE) that automatically adapts to the dynamically-changing execution environment. To meet basic security concerns in an open GRID environment we propose to use novel analysis techniques to provide information about resource consumption. Key components of the environment are a monitoring tool that determines static and dynamic properties of the network of HPCs, a distributed infrastructure for attaching resource certificates to executable code and a virtual machine that dynamically adapts parallel execution based on the current configuration and the provided resource certificates to improve performance.

The effectiveness of the adaptive RTE will be evaluated by measuring the performance of two existing substantial parallel applications on several networks of GRID- enabled clusters. The monitoring tool, certification system and adaptive virtual machine have separate research objectives and will be developed separately before being integrated for evaluation.

Objective

We aim to determine whether a GRID of cheap clusters can be transparently and effectively utilised as a single high performance computing platform using a high-level language supported by a sophisticated runtime environment (RTE). The core objective is determining whether a sophisticated RTE can be constructed to deliver speedups and scaleups for real programs on modest GRIDs of high performance computers (HPCs). The RTE comprises a monitoring tool and an adaptive virtual machine, supported by resource certificates that are attached to executable code when sent to another processor. The main research objectives are as follows.

Monitoring tool. To determine the crucial static and dynamic architectural properties of a GRID-of-HPCs at the correct level of abstraction to facilitate effective management of high performance computing. To construct and evaluate prototype tools using the GRID infrastructure.

Adaptive virtual machine. To port an existing adaptive RTE to the GRID and radically extend it with new strategies to automatically and dynamically manage program execution on a GRID-of-HPCs. The management strategies utilise static and dynamic configuration information produced by the monitoring tool, and guaranteed resource use information. The strategies must effectively manage the hierarchical, heterogeneous, shared and high-latency nature of the architecture.

People

Updated: June 2003 Abyd Al Zain