About GpH |
|
Language |
|
Using GpH |
|
Activities |
|
Publications |
|
Links |
|
|
List of Publications (with Abstracts)
[64]
|
C. Brown, H.-W. Loidl, J. Berthold, and K. Hammond.
Improving your CASH flow: The Computer Algebra SHell
(Extended Abstract).
In IFL'10 - Intl. Workshop on the Implementation of Functional
Languages, Sept. 2010.
Draft Proceedings.
[ bib |
.pdf ]
Some important challenges in the field of symbolic computation -
and functional programming - are the transparent access to
complex, mathematical software, the exchange of data between
independent systems with specialised tasks and the exploitation of
modern parallel hardware. One attempt to solve this problem is
SymGrid-Par, a system for exploiting parallel hardware in the
context of computer algebra. Specifically, SymGrid-Par provides
an easy-to-use platform for parallel computation that connects
several underlying computer algebra systems, communicating through
a standardised protocol for symbolic computation.
In this paper we describe a new component of SymGrid-Par known as
CaSH: the Computer Algebra SHell. CaSH is a system that allows
direct access to SymGrid-Par via GHCi. CaSH thus allows Haskell
programmers to exploit high-performance parallel computations
using a system designated for solving problems in computer
algebra; whilst still maintaining the purity and expressability
offered by the Haskell environment. We demonstrate access to both
sequential and parallel services of SymGrid-Par. For the latter
we use parallel skeletons, implemented in the Haskell dialect of
Eden; these skeletons are called from CaSH but exploit a
computational algebra system known as GAP to offload the
mathematical complexity.
Keywords: parallel applications, Eden, symbolic computation
|
[63]
|
V. Janjic and K. Hammond.
Granularity-Aware Work-Stealing for Computationally-Uniform
Grids.
In CCGRID 2010 - Intl. Conf. on Cluster, Cloud and Grid
Computing, pages 123-134, Melbourne, Australia, May 2010.
[ bib |
DOI ]
Good scheduling is important for ensuring effective use of Grid
resources, while maximising parallel performance. In this paper,
we show how a basic “Random-Stealing” load balancing algorithm
for computational Grids can be improved by using information about
the task granularity of parallel programs. We propose several
strategies (SSL, SLL and LLL) for using granularity information to
improve load balancing, presenting results both from simulations
and from a real implementation (the Grid-GUM Runtime System for
Parallel Haskell). We assume a common model of task creation which
subsumes both master/worker and data-parallel programming
paradigms under a task-stealing work distribution
strategy. Overall, we achieve improvement in runtime of up to 19.4
percent for irregular problems in the real implementation, and up
to 40 percent for the simulations (typical improvements of more
that 15 percent for irregular programs, and from 5-10 percent for
regular ones). Our results show that, for computationally-uniform
Grids, advanced load balancing methods that exploit granularity
information generally have the greatest impact on reducing the
runtimes of irregular parallel programs. Moreover, the more
irregular the program is, the better the improvements that can be
achieved.
Keywords: implementation, GpH, non-uniform topology, Grid
|
[62]
|
S. Marlow, P. Maier, H.-W. Loidl, M. K. Aswad, and P. W. Trinder.
Seq no more: Better Strategies for Parallel Haskell.
In Haskell Symposium 2010, Baltimore, MD, USA, Sept. 2010. ACM
Press.
To appear.
[ bib |
.html ]
We present a complete redesign of evaluation strategies, a key
abstraction for specifying pure, deterministic parallelism in
Haskell. Our new formulation preserves the compositionality and
modularity benefits of the original, while providing significant
new benefits. First, we introduce an evaluation-order monad to
provide clearer, more generic, and more efficient specification of
parallel evaluation. Secondly, the new formulation resolves a
subtle space management issue with the original strategies,
allowing parallelism (sparks) to be preserved while reclaiming
heap associated with superfluous parallelism. Related to this,
the new formulation provides far better support for speculative
parallelism as the garbage collector now prunes unneeded
speculation. Finally, the new formulation provides improved
compositionality: we can directly express parallelism embedded
within lazy data structures, producing more compositional
strategies, and our basic strategies are parametric in the
coordination combinator, facilitating a richer set of parallelism
combinators.
We give measurements over a range of benchmarks demonstrating that
the runtime overheads of the new formulation relative to the
original are low, and the new strategies even yield slightly
better speedups on average than the original strategies.
Keywords: implementation, GpH, evaluation strategies
|
[61]
|
M. K. Aswad, P. W. Trinder, A. Al Zain, G. J. Michaelson, and J. Berthold.
Low Pain vs No Pain Multi-core Haskells.
In TFP'09 - Symp. on Trends in Functional Programming,
volume 10 of Trends in Functional Programming, Komarno, Slovakia, June
2009. Intellect.
To appear.
[ bib |
.pdf ]
Multicores are becoming the dominant processor technology and
functional languages are theoretically well suited to exploit
them. In practice, however, implementing effective high level
parallel functional languages is extremely challenging.
This paper is the first programming and performance comparison of
functional multicore technologies and reports some of the first
ever multicore results for two languages. As such it reflects the
growing maturity of the field by systematically evaluating four
parallel Haskell implementations on a common multicore
architecture. The comparison contrasts the programming effort
each language requires with the parallel performance
delivered. The study uses 15 'typical' programs to compare a 'no
pain', i.e. entirely implicit, parallel language with three 'low
pain', i.e. semi-explicit languages.
The parallel Haskell implementations use different versions of GHC
compiler technology, and hence the comparative performance metric
is speedup which normalises against sequential performance. We
ground the speedup comparisons by reporting both sequential and
parallel runtimes and efficiencies for three of the languages. Our
experiments focus on the number of programs improved, the absolute
speedups delivered, the parallel scalability, and the program
changes required to coordinate parallelism. The results are
encouraging and, on occasion, surprising.
Keywords: models of parallelism, GpH, Eden, GHC-SMP, GUM, multicore
|
[60]
|
J. Berthold, S. Marlow, K. Hammond, and A. Al Zain.
Comparing and Optimising Parallel Haskell Implementations for
Multicore Machines.
In ADPNA 2009 - Intl. Workshop on Advanced Distributed and
Parallel Network Applications, pages 386-393, Vienna, Austria, Sept. 2009.
IEEE Computer Society.
[ bib |
DOI ]
In this paper, we investigate the differences and tradeoffs
imposed by two parallel Haskell dialects running on multicore
machines. GpH and Eden are both constructed using the
highly-optimising sequential GHC compiler, and share thread
scheduling, and other elements, from a common code base. The GpH
implementation investigated here uses a physically-shared heap,
which should be well-suited to multicore architectures. In
contrast, the Eden implementation adopts an approach that has been
designed for use on distributed-memory parallel machines: a system
of multiple, independent heaps (one per core), with inter-core
communication handled by message-passing rather than through
shared heap cells. We report two main results. Firstly, we report
on the effect of a number of optimisations that we applied to the
shared-memory GpH implementation in order to address some
performance issues that were revealed by our testing: for example,
we implemented a work-stealing approach to task allocation. Our
optimisations improved the performance of the shared-heap GpH
implementation by as much as 30 percent on eight cores. Secondly,
the shared heap approach is, rather surprisingly, not superior to
a distributed heap implementation: both give similar performance
results.
Keywords: models of parallelism, implementation, GpH, Eden, multicore
|
[59]
|
D. Jones Jr., S. Marlow, and S. Singh.
Parallel Performance Tuning for Haskell.
In Haskell Symposium 2009, pages 81-92, Edinburgh, Scotland,
Sept. 2009. ACM Press.
[ bib |
DOI |
http ]
Parallel Haskell programming has entered the mainstream with
support now included in GHC for multiple parallel programming
models, along with multicore execution support in the
runtime. However, tuning programs for parallelism is still
something of a black art. Without much in the way of feedback
provided by the runtime system, it is a matter of trial and error
combined with experience to achieve good parallel speedups.
This paper describes an early prototype of a parallel profiling
system for multicore programming with GHC. The system comprises
three parts: fast event tracing in the runtime, a Haskell library
for reading the resulting trace files, and a number of tools built
on this library for presenting the information to the
programmer. We focus on one tool in particular, a graphical
timeline browser called ThreadScope.
The paper illustrates the use of ThreadScope through a number of
case studies, and describes some useful methodologies for
parallelizing Haskell programs.
Keywords: tools, GpH, GHC-SMP, multicore, profiling
|
[58]
|
S. Marlow, S. L. Peyton-Jones, and S. Singh.
Runtime Support for Multicore Haskell.
In ICFP 2009 - Intl. Conf. on Functional Programming, pages
65-78, Edinburgh, Scotland, Aug. 2009. ACM Press.
[ bib |
DOI |
http ]
Purely functional programs should run well on parallel hardware
because of the absence of side effects, but it has proved hard to
realise this potential in practice. Plenty of papers describe
promising ideas, but vastly fewer describe real implementations
with good wall-clock performance. We describe just such an
implementation, and quantitatively explore some of the complex
design tradeoffs that make such implementations hard to build. Our
measurements are necessarily detailed and specific, but they are
reproducible, and we believe that they offer some general
insights.
Keywords: implementation, GpH, GHC-SMP, multicore
|
[57]
|
P. W. Trinder, M. I. Cole, H.-W. Loidl, and G. J. Michaelson.
Characterising Effective Resource Analyses for Parallel and
Distributed Coordination.
In FOPARA 2009 - Intl. Workshop on Foundational and Practical
Aspects of Resource Analysis, Revised Selected Papers, volume 6324 of
Lecture Notes in Computer Science, pages 67-83, Eindhoven, The Netherlands,
Nov. 2009. Springer.
[ bib |
DOI |
.pdf ]
An important application of resource analysis is to improve the
performance of parallel and distributed programs. In this context
key resources are time, space and communication. Given the
spectrum of cost models and associated analysis techniques
available, what combination should be selected for a specific
parallel or distributed context?
We address the question as follows. We outline a continuum of
coordination cost models and a range of analysis techniques. We
consider six representative parallel/distributed applications of
resource analysis techniques, and aim to extract general
principles governing why the combination of techniques is
effective in its context.
Keywords: semantics, models of parallelism, survey, resource analysis
|
[56]
|
A. Al Zain, K. Hammond, J. Berthold, P. W. Trinder, G. J. Michaelson, and M. K.
Aswad.
Low-Pain, High-Gain Multicore Programming in Haskell:
Coordinating Irregular Symbolic Computations on Multicore Architectures.
In DAMP'09 - Intl. Workshop on Declarative Aspects of
Multicore Programming, pages 25-36, Savannah, GA, USA, Jan. 2009. ACM
Press.
[ bib |
DOI |
.pdf ]
With the emergence of commodity multicore architectures,
exploiting tightly-coupled parallelism has become increasingly
important. Functional programming languages, such as Haskell, are,
in principle, well placed to take advantage of this trend,
offering the ability to easily identify large amounts of
fine-grained parallelism. Unfortunately, obtaining real
performance benefits has often proved hard to realise in practice.
This paper reports on a new approach using middleware that has
been constructed using the Eden parallel dialect of Haskell. Our
approach is “low pain” in the sense that the programmer
constructs a parallel program by inserting a small number of
higher-order algorithmic skeletons at key points in the
program. It is “high gain” in the sense that we are able to get
good parallel speedups. Our approach is unusual in that we do not
attempt to use shared memory directly, but rather coordinate
parallel computations using a message-passing implementation. This
approach has a number of advantages. Firstly, coordination,
i.e. locking and communication, is both confined to limited shared
memory areas, essentially the communication buffers, and is also
isolated within well-understood libraries. Secondly, the coarse
thread granularity that we obtain reduces coordination overheads,
so locks are normally needed only on (relatively large) messages,
and not on individual data items, as is often the case for simple
shared-memory implementations. Finally, cache coherency
requirements are reduced since individual tasks do not share
caches, and can garbage collect independently.
We report results for two representative computational algebra
problems. Computational algebra is a challenging application area
that has not been widely studied in the general parallelism
community. Computational algebra applications have high
computational demands, and are, in principle, often suitable for
parallel execution, but usually display a high degree of
irregularity in terms of both task and data structure. This makes
it difficult to construct parallel applications that perform well
in practice. Using our system, we are able to obtain both
extremely good processor utilisation (97 percent) and very good
absolute speedups (up to 7.7) on an eight-core machine.
Keywords: parallel applications, Eden, multicore, symbolic computation
|
[55]
|
J. Berthold, A. Al Zain, and H.-W. Loidl.
Scheduling Light-Weight Parallelism in ArTCoP.
In PADL 2008 - Intl. Symp. on Practical Aspects of Declarative
Languages, volume 4902 of Lecture Notes in Computer Science, pages
214-229, San Francisco, CA, USA, Jan. 2008. Springer.
[ bib |
DOI |
.pdf ]
We present the design and prototype implementation of the
scheduling component in ArTCoP (architecture transparent control
of parallelism), a novel run-time environment (RTE) for parallel
execution of high-level languages. A key feature of ArTCoP is its
support for deep process and memory hierarchies, shown in the
scheduler by supporting light-weight threads. To realise a system
with easily exchangeable components, the system defines a
micro-kernel, providing basic infrastructure, such as garbage
collection. All complex RTE operations, including the handling of
parallelism, are implemented at a separate system level. By
choosing Concurrent Haskell as high-level system language, we
obtain a prototype in the form of an executable specification that
is easier to maintain and more flexible than conventional RTEs.
We demonstrate the flexibility of this approach by presenting
implementations of a scheduler for light-weight threads in ArTCoP,
based on GHC Version 6.6.
Keywords: implementation, run-time system
|
[54]
|
H.-W. Loidl, P. W. Trinder, K. Hammond, A. Al Zain, and C. A. Baker-Finch.
Semi-Explicit Parallel Programming in a Purely Functional Style:
GpH.
In M. Alexander and B. Gardner, editors, Process Algebra for
Parallel and Distributed Processing: Algebraic Languages in
Specification-Based Software Development, pages 47-76. Chapman and Hall,
Dec. 2008.
[ bib |
.pdf ]
Declarative programming languages can play an important role in
the process of designing and implementing parallel systems. They
bridge the gap between a high-level specification, with proven
properties of the overall system, and the execution of the system
on real hardware. Efficiently exploiting parallelism on a wide
range of architectures is a challenging task and should in our
view be handled by a sophisticated runtime environment. Based on
this design philosophy we have developed and formalised Glasgow
parallel Haskell (GpH), and implemented it as a conservative
extension of the Glasgow Haskell Compiler.
The high-level nature of declarative languages eases the task of
mapping an algebraic specification down to executable code. In
fact, the operational components of the specification can already
be considered an implementation, with the associated properties
acting as assertions to the code. Based on a formal model of the
declarative language, the validity of these properties can be
established by manual proof, which works on a level of detail
similar to the specification language itself. Many operational
aspects, usually complicating a proof of an implementation, do not
come into the picture at this level. Most importantly,
unnecessary sequentialisation of the code is avoided.
However, the goal of implicit parallelism has proven an elusive
one. Often the automatically generated threads are too
fine-grained to be efficient. In other cases the
data-dependencies between expressions prohibit the generation of a
sufficient amount of parallelism. Thus, we employ an approach of
semi-explicit parallelism, where only potential parallelism has to
be annotated in a program, and all aspects of coordination are
delegated to the runtime environment. A corresponding formal
model, in the form of a structured operational semantics, handling
pools of realised and of potential parallelism, is used to
establish the correctness of the code employing semi-explicit
parallelism. The runtime environment itself is capable of
synchronising parallel threads, using automatic blocking on data
under evaluation, and by simulating virtual shared memory across
networks of machines. Being embedded into the optimised runtime
environment for a sequential language, we achieve efficient
execution of a high-level language, close to the original
specification language, while minimising the programmer effort in
parallelising the code and being scalable to large-scale
applications that can be executed on heterogeneous networks and
computational Grids.
Keywords: semantics, GpH
|
[53]
|
S. Marlow, T. Harris, R. P. James, and S. L. Peyton-Jones.
Parallel Generational-Copying Garbage Collection with a
Block-structured Heap.
In ISMM 2008 - Intl. Symp. on Memory Management, pages
11-20, Tucson, AZ, USA, June 2008. ACM Press.
[ bib |
DOI ]
We present a parallel generational-copying garbage collector
implemented for the Glasgow Haskell Compiler. We use a
blockstructured memory allocator, which provides a natural
granularity for dividing the work of GC between many threads,
leading to a simple yet effective method for parallelising copying
GC. The results are encouraging: we demonstrate wall-clock
speedups of on average a factor of 2 in GC time on a commodity
4-core machine with no programmer intervention, compared to our
best sequential GC.
Keywords: implementation, GpH, GHC-SMP, multicore
|
[52]
|
P. B. Vasconcelos.
Cost Inference and Analysis for Recursive Functional Programs.
PhD thesis, University of St Andrews, Feb. 2008.
[ bib |
.pdf ]
Programming resource-sensitive systems, such as real-time embedded
systems, requires guaranteeing both the functional correctness of
computations and also that time and space usage fit within
constraints imposed by hardware limits or the environment.
Functional programming languages have proved very good at meeting
the former logical kind of guarantees but not the latter resource
guarantees.
This thesis contributes to demonstrate the applicability of
functional programming in resource-sensitive systems with an
automatic program analysis for obtaining guaranteed upper bounds
on dynamic space usage of functional programs.
Our analysis is developed for a core subset of Hume, a
domain-specific functional language targeting resource-sensitive
systems (Hammond et al. 2007), and presented as a type and effect
system that builds on previous sized type systems (Hughes et
al. 1996, Chin and Khoo 2001) and effect systems for costs (Dornic
et al. 1992, Reistad and Gifford 1994, Hughes and Pareto 1999). It
extends previous approaches by using abstract interpretation
techniques to automatically infer linear approximations of the
sizes of recursive data types and the stack and heap costs of
recursive functions.
The correctness of the analysis is formally proved with respect to
an operational semantics for the language and an inferrence
algorithm that automatically reconstructs size and cost bounds is
presented.
A prototype implementation of the analysis and operational
semantics has been constructed and used to experimentally assess
the quality of the cost bounds with some examples, including
implementations of textbook functional programming algorithms and
simplified embedded systems.
Keywords: semantics, resource analysis, thesis
|
[51]
|
A. Al Zain, P. W. Trinder, G. J. Michaelson, and H.-W. Loidl.
Evaluating a High-Level Parallel Language (GpH) for
Computational GRIDs.
IEEE Transactions on Parallel and Distributed Systems,
19(2):219-233, 2008.
[ bib |
DOI |
.pdf ]
Computational Grids potentially offer low cost, readily available,
and large-scale high-performance platforms. For the parallel
execution of programs, however, computational Grids pose serious
challenges: they are heterogeneous, and have hierarchical and
often shared interconnects, with high and variable latencies
between clusters.
This paper investigates whether a programming language with
high-level parallel coordination and a Distributed Shared Memory
model (DSM) can deliver good, and scalable, performance on a range
of computational Grid configurations. The high-level language,
Glasgow parallel Haskell (GpH), abstracts over the architectural
complexities of the computational Grid, and we have developed
GridGUM2, a sophisticated grid-specific implementation of GpH, to
produce the first high-level DSM parallel language implementation
for computational Grids.
We report a systematic performance evaluation of GridGUM2 on
combinations of high/low and homo/hetero-geneous computational
Grids. We measure the performance of a small set of kernel
parallel programs representing a variety of application areas, two
parallel paradigms, and ranges of communication degree and
parallel irregularity. We investigate GridGUM2's performance
scalability on medium-scale heterogeneous and high-latency
computational Grids, and analyse the performance with respect to
the program characteristics of communication frequency and degree
of irregular parallelism.
Keywords: models of parallelism, GpH, non-uniform topology, GridGUM
|
[50]
|
K. Hammond, A. Al Zain, G. Cooperman, D. Petcu, and P. W. Trinder.
SymGrid: A Framework for Symbolic Computation on the Grid.
In Euro-Par 2007 - Intl. Conf. on Parallel Processing, volume
4641 of Lecture Notes in Computer Science, pages 457-466, Rennes,
France, Aug. 2007. Springer.
[ bib |
DOI |
.pdf ]
This paper introduces the design of SymGrid, a new Grid framework
that will, for the first time, allow multiple invocations of
symbolic computing applications to interact via the Grid. SymGrid
is designed to support the specific needs of symbolic computation,
including computational steering (greater interactivity), complex
data structures, and domain-specific computational patterns (for
irregular parallelism). A key issue is heterogeneity: SymGrid is
designed to orchestrate components from different symbolic systems
into a single coherent (possibly parallel) Grid application,
building on the OpenMath standard for data exchange between
mathematically-oriented applications. The work is being developed
as part of a major EU infrastructure project.
Keywords: parallel applications, symbolic computation
|
[49]
|
A. Al Zain, K. Hammond, P. W. Trinder, S. Linton, H.-W. Loidl, and M. Costanti.
SymGrid-Par: Designing a Framework for Executing Computational
Algebra Systems on Computational Grids.
In PAPP 2007 - Intl. Workshop on Practical Aspects of
High-level Parallel Programming, volume 4488 of Lecture Notes in
Computer Science, pages 617-624, Beijing, China, May 2007. Springer.
[ bib |
DOI |
.pdf ]
SymGrid-Par is a new framework for executing large computer
algebra problems on computational Grids. We present the design of
SymGrid-Par, which supports multiple computer algebra packages,
and hence provides the novel possibility of composing a system
using components from different packages. Orchestration of the
components on the Grid is provided by a Grid-enabled parallel
Haskell (GpH). We present a prototype implementation of a core
component of SymGrid-Par, together with promising measurements of
two programs on a modest Grid to demonstrate the feasibility of
our approach.
Keywords: parallel applications, symbolic computation
|
[48]
|
A. Al Zain, P. W. Trinder, H.-W. Loidl, and G. J. Michaelson.
Supporting High-Level Grid Parallel Programming: the Design and
Implementation of Grid-GUM2 (Full Paper).
In UK e-Science 2007 All Hands Meeting, Nottingham, UK, Sept.
2007.
[ bib |
.pdf ]
Computational Grids are much more complex than classical
high-performance architectures: they have high, and dynamically
varying communication latencies, and comprise heterogeneous
collections of processing elements. We argue that such a complex
and dynamic architecture is extremely challenging for the
programmer to explicitly manage, and advocate a high-level
parallel programming language, like GpH, where the programmer
controls only a few key aspects of the parallel coordination. Such
a high-level language requires a sophisticated implementation to
manage parallel execution.
This paper presents the design and implementation of Grid-GUM2, a
Grid-specic runtime environment for GpH. The core of the design
are monitoring mechanisms that collect static and partial dynamic
information, and new load management mechanisms that use the
information to better schedule tasks on computational Grids. These
mechanisms are built on top of a virtual shared memory (VSM),
graph-reduction-based computation engine. A systematic evaluation
of the implementation, reported elsewhere, shows acceptable
scaleup and good speedups on a range of computational Grids.
Keywords: implementation, GpH, non-uniform topology, GridGUM
|
[47]
|
A. Al Zain, P. W. Trinder, H.-W. Loidl, and G. J. Michaelson.
Managing Heterogeneity in a Grid Parallel Haskell.
Scalable Computing: Practice and Experience, 7(3):9-25, 2006.
Selected papers from PAPP 2005 - Intl. Workshop on Practical
Aspects of High-level Parallel Programming, Atlanta, GA, USA, May 2005.
[ bib |
.pdf ]
Computational Grids potentially offer cheap large-scale
high-performance systems, but are a very challenging architecture,
being heterogeneous, shared and hierarchical. Rather than
requiring a programmer to explicitly manage this complex
environment, we recommend using a high-level parallel functional
language, like GpH, with largely automatic management of parallel
coordination.
We present GRID-GUM, an initial port of the distributed virtual
shared-memory implementation of GpH for computational Grids. We
show that, GRID-GUM delivers acceptable speedups on relatively low
latency homogeneous and heterogeneous computational
Grids. Moreover, we find that for heterogeneous computational
Grids, load management limits performance.
We present the initial design of GRID-GUM2, that incorporates new
load management mechanisms that cheaply and effectively combine
static and dynamic information to adapt to heterogeneous
Grids. The mechanisms are evaluated by measuring four non-trivial
programs with different parallel properties. The measurements show
that the new mechanisms improve load distribution over the
original implementation, reducing runtime by factors ranging from
17 to 57 percent, and the greatest improvement is obtained for the
most dynamic program.
Keywords: implementation, GpH, non-uniform topology, GridGUM
|
[46]
|
T. Harris, S. Marlow, and S. L. Peyton-Jones.
Haskell on a Shared-Memory Multiprocessor.
In Haskell Workshop 2005, pages 49-61, Tallinn, Estonia, Sept.
2005. ACM Press.
[ bib |
DOI |
http ]
Multi-core processors are coming, and we need ways to program
them. The combination of purely-functional programming and
explicit, monadic threads, communicating using transactional
memory, looks like a particularly promising way to do so. This
paper describes a full-scale implementation of shared-memory
parallel Haskell, based on the Glasgow Haskell Compiler. Our main
technical contribution is a lock-free mechanism for evaluating
shared thunks that eliminates the major performance bottleneck in
parallel evaluation of a lazy language. Our results are
preliminary but promising: we can demonstrate wall-clock speedups
of a serious application (GHC itself), even with only two
processors, compared to the same application compiled for a
uni-processor.
Keywords: implementation, GpH, GHC-SMP, multicore
|
[45]
|
M. Lange and H.-W. Loidl.
Parallel and Symbolic Model Checking for Fixpoint Logic with
Chop.
Electr. Notes Theor. Comput. Sci., 128(3):125-138, 2005.
Selected Papers from PDMC'04 - Intl. Workshop on Parallel and
Distributed Techniques in Verification, London, UK, Sep 2004.
[ bib |
DOI |
.ps ]
We consider the model checking problem for FLC, a modal fixpoint
logic capable of defining non-regular properties. This paper
presents a refinement of a symbolic model checker and discusses
how to parallelise this algorithm. It reports on a prototype
implementation of the algorithm in Glasgow Parallel Haskell (GpH)
and its performance on a cluster of workstations.
Keywords: parallel applications, GpH, model checking
|
[44]
|
H.-W. Loidl, F. Rubio, N. Scaife, K. Hammond, S. Horiguchi, U. Klusik,
R. Loogen, G. J. Michaelson, R. Peña, S. Priebe, Á. J. Rebón
Portillo, and P. W. Trinder.
Comparing Parallel Functional Languages: Programming and
Performance.
Higher-Order and Symbolic Computation, 16(3):203-251, 2003.
[ bib |
DOI |
.ps ]
This paper presents a practical evaluation and comparison of three
state-of-the-art parallel functional languages. The evaluation is
based on implementations of three typical symbolic computation
programs, with performance measured on a Beowulf-class parallel
architecture.
We assess three mature parallel functional languages: PMLS, a
system for implicitly parallel execution of ML programs; GPH, a
mainly implicit parallel extension of Haskell; and Eden, a more
explicit parallel extension of Haskell designed for both
distributed and parallel execution. While all three languages
employ a completely implicit approach to communication, each
language takes a different approach to specifying and controlling
parallelism, ranging from explicit identification of processes as
language constructs (Eden) through annotation of potential
parallelism (GPH) to automatic detection of parallel skeletons in
sequential code (PMLS).
We present detailed performance measurements of all three systems
on a widely available parallel architecture: a Beowulf cluster of
low-cost commodity workstations. We use three representative
symbolic applications: a matrix multiplication algorithm, an exact
linear system solver, and a simple ray-tracer. Our results show
how moderate speedups can be achieved with little or no changes to
the sequential code, and that parallel performance can be
significantly improved even within our high-level model of
parallel functional programming by controlling key aspects of the
program such as load distribution and thread granularity.
Keywords: models of parallelism, GpH, Eden, PMLS, distributed memory
|
[43]
|
A. R. Du Bois, R. F. Pointon, H.-W. Loidl, and P. W. Trinder.
Implementing Declarative Parallel Bottom-Avoiding Choice.
In SBAC-PAD 2002 - Symp. on Computer Architecture and High
Performance Computing, pages 82-92, Vitoria, Brazil, Oct. 2002. IEEE
Computer Society.
[ bib |
DOI |
.ps ]
Non-deterministic choice supports efficient parallel speculation,
but unrestricted non-determinism destroys the referential
transparency of purely-declarative languages by removing
unfoldability and it bears the danger of wasting resources on
unncessary computations. While numerous choice mechanisms have
been proposed that preserve unfoldability, and some concurrent
implementations exist, we believe that no compiled parallel
implementation has previously been constructed. This paper
presents the design, semantics, implementation and use of a family
of bottom-avoiding choice operators for Glasgow parallel
Haskell. The subtle semantic properties of our choice operations
are described, including a careful classification using an
existing framework, together with a discussion of operational
semantics issues and the pragmatics of distributed memory
implementation. The expressiveness of our choice operators is
demonstrated by constructing a branch and bound search, a merge
and a speculative conditional. Their effectiveness is demonstrated
by comparing the parallel performance of the speculative search
with naive and 'perfect' implementations. Their efficiency is
assessed by measuring runtime overhead and heap consumption.
Keywords: implementation, semantics, GpH, speculative parallelism
|
[42]
|
A. R. Du Bois, H.-W. Loidl, and P. W. Trinder.
Thread Migration in a Parallel Graph Reducer.
In IFL'02 - Intl. Workshop on the Implementation of Functional
Languages, volume 2670 of Lecture Notes in Computer Science, pages
199-214, Madrid, Spain, Sept. 2002. Springer.
[ bib |
.ps ]
To support high level coordination, parallel functional languages
need effective and automatic work distribution mechanisms. Many
implementations distribute potential work, i.e. sparks or
closures, but there is good evidence that the performance of
certain classes of program can be improved if current work, or
threads, are also distributed. Migrating a thread incurs
significant execution cost and requires careful scheduling and an
elaborate implementation.
This paper describes the design, implementation and performance of
thread migration in the GUM runtime system underlying Glasgow
parallel Haskell (GpH). Measurements of nontrivial programs on a
highlatency cluster architecture show that thread migration can
improve the performance of data-parallel and divide-and-conquer
programs with low processor utilisation. Thread migration also
reduces the variation in performance results obtained in separate
executions of a program. Moreover, migration does not incur
significant overheads if there are no migratable threads, or on a
single processor. However, for programs that already exhibit good
processor utilisation, migration may increase performance
variability and very occasionally reduce performance.
Keywords: implementation, GpH, GUM
|
[41]
|
S. B. Junaidu and P. W. Trinder.
Parallelising large irregular programs: an experience with
Naira.
Information Sciences, 140(3-4):229-240, 2002.
Title of pre-print differs from published article.
[ bib |
.ps.gz ]
Naira is a compiler for Haskell, written in Glasgow parallel
Haskell. It exhibits modest, but irregular, parallelism that is
determined by properties of the program being compiled, e.g. the
complexity of the types and of the pattern matching. We report
four experiments into Naira's parallel behaviour using a set of
realistic inputs: namely the 18 Haskell modules of Naira
itself. The issues investigated are:
* Does increasing input size improve sequential efficiency and
speedup?
* To what extent do high communications latencies reduce average
parallelism and speedup?
* Does migrating running threads between processors improve
average parallelism and speedup at all latencies?
Keywords: parallel applications, GpH
|
[40]
|
H.-W. Loidl.
The Virtual Shared Memory Performance of a Parallel Graph
Reducer.
In CCGrid 2002 - Intl. Symp. on Cluster Computing and the
Grid, pages 311-318, Berlin, Germany, May 2002. IEEE Computer Society.
[ bib |
.html ]
This paper assesses the costs of maintaining a virtual shared heap
in our parallel graph reducer (GUM), which implements a parallel
functional language. GUM performs automatic and dynamic resource
management for both work and data. We introduce extensions to the
original design of GUM, aiming at a more flexible memory
management and communication mechanism to deal with high-latency
systems. We then present measurements of running GUM on a Beowulf
cluster, evaluating the overhead of dynamic distributed memory
management and the effectiveness of the new memory management and
communication mechanisms.
Keywords: implementation, GpH, GUM
|
[39]
|
Á. J. Rebón Portillo, K. Hammond, H.-W. Loidl, and P. B. Vasconcelos.
Cost Analysis Using Automatic Size and Time Inference.
In IFL'02 - Intl. Workshop on the Implementation of Functional
Languages, volume 2670 of Lecture Notes in Computer Science, pages
232-247, Madrid, Spain, Sept. 2002. Springer.
[ bib |
.ps ]
Cost information can be exploited in a variety of contexts,
including parallelizing compilers, autonomic GRIDs and real-time
systems. In this paper, we introduce a novel type and effect
system: the sized time system that is capable of determining upper
bounds for both time and space costs, and which we initially
intend to apply to determining good granularity for parallel
tasks. The analysis is defined for a simple, strict, higher-order
and polymorphic functional language, L, incorporating
arbitrarily-sized list data structures. The inference algorithm
implementing this analysis constructs cost- and size-terms for
L-expressions, plus constraints over free size and cost variables
in those terms that can be solved to produce information for
higher-order functions. The paper presents both the analysis and
the inference algorithm, providing examples that illustrate the
primary features of the analysis.
Keywords: semantics, resource analysis
|
[38]
|
P. W. Trinder and R. F. Loidl, Hans-WolfgangPointon.
Parallel and Distributed Haskells.
Journal of Functional Programming, 12(4&5):469-510, 2002.
[ bib |
.html ]
Parallel and distributed languages specify computations on
multiple processors and have a computation language to describe
the algorithm, i.e. what to compute, and a coordination language
to describe how to organise the computations across the
processors. Haskell has been used as the computation language for
a wide variety of parallel and distributed languages, and this
paper is a comprehensive survey of implemented languages. We
outline parallel and distributed language concepts and classify
Haskell extensions using them. Similar example programs are used
to illustrate and contrast the coordination languages, and the
comparison is facilitated by the common computation language. A
lazy language is not an obvious choice for parallel or distributed
computation, and we address the question of why Haskell is a
common functional computation language.
Keywords: models of parallelism, Eden, GpH, GdH, survey
|
[37]
|
H.-W. Loidl.
Load Balancing in a Parallel Graph Reducer.
In SFP'01 - Scottish Functional Programming Workshop,
volume 3 of Trends in Functional Programming, pages 63-74, Stirling,
Scotland, Aug. 2001. Intellect.
[ bib |
.html ]
Parallel graph reducers such as GUM use dynamic techniques to
manage resources during execution. One important aspect of the
dynamic behaviour is the distribution of work. The load balancing
mechanism, which controls this aspect, should be flexible, to
adjust the distribution of work to hardware characteristics as
well as dynamic program characteristics, and scalable, to achieve
high utilisation of all processors even on massively parallel
machines.
In this paper we study the behaviour of GUM's load balancing
mechanism on a high-latency Beowulf multi-processor. We present
modifications to the basic load balancing mechanism and discuss
runtime measurements, which indicate that these modifications can
significantly enhance the scalability of the system.
Keywords: implementation, GpH, GUM
|
[36]
|
H.-W. Loidl, P. W. Trinder, and C. Butz.
Tuning Task Granularity and Data Locality of Data Parallel GpH
Programs.
Parallel Processing Letters, 11(4):471-486, 2001.
Selected papers from HLPP'01 - Intl. Workshop on High-level
Parallel Programming and Applications, Orleans, France, 26-27 March, 2001.
[ bib |
.html ]
The performance of data parallel programs often hinges on two key
coordination aspects: the computational costs of the parallel
tasks relative to their management overhead - task granularity;
and the communication costs induced by the distance between tasks
and their data - data locality. In data parallel programs both
granularity and locality can be improved by clustering,
i.e. arranging for parallel tasks to operate on related
sub-collections of data.
The GpH parallel functional language automatically manages most
coordination aspects, but also allows some high-level control of
coordination using evaluation strategies. We study the
coordination behavior of two typical data parallel programs, and
find that while they can be improved by introducing clustering
evaluation strategies, further performance improvements can be
achieved by restructuring the program.
We introduce a new generic Cluster class that allows clustering to
be systematically introduced, and improved by program
transformation. In contrast to many other parallel program
transformation approaches, we transform realistic programs and
report performance results on a 32-processor Beowulf cluster. The
cluster class is highly-generic and extensible, amenable to
reasoning, and avoids conflating computation and coordination
aspects of the program.
Keywords: models of parallelism, GpH, evaluation strategies, granularity
|
[35]
|
R. F. Pointon, S. Priebe, H.-W. Loidl, R. Loogen, and P. W. Trinder.
Functional vs Object-Oriented Distributed Languages.
In EUROCAST'01 - Intl. Conf. on Computer Aided Systems
Theory, volume 2178 of Lecture Notes in Computer Science, pages
642-656, Las Palmas de Gran Canaria, Spain, Feb. 2001. Springer.
[ bib |
DOI |
.ps ]
Conventional distributed programming languages require the
programmer to explicitly specify many aspects of distributed
coordination, including resource location, task placement,
communication and synchronisation. Functional languages aim to
provide higher-level abstraction, and this paper investigates the
effectiveness of this for distributed co-ordination. The
investigation contrasts and compares contrasts Java and two
Haskell-based distributed functional languages, Eden and
GdH. Three distributed programs are used as case studies, and the
performance and programming effort are reported.
Keywords: models of parallelism, Eden, GdH, Java, survey
|
[34]
|
C. A. Baker-Finch, D. J. King, and P. W. Trinder.
An Operational Semantics for Parallel Lazy Evaluation.
In ICFP'00 - Intl. Conf. on Functional Programming, pages
162-173, Montreal, Canada, Sept. 2000. ACM Press.
[ bib |
DOI |
.pdf ]
We present an operational semantics for parallel lazy evaluation
that accurately models the parallel behaviour the non-strict
parallel functional language GpH. Parallelism is modelled
synchronously, that is, single reductions are carried out
separately, then combined before proceeding onto the next set of
reductions. Consequently the semantics has two levels, with
single-step transition rules at one level and combining rules at
the other. Parallel threads are modelled by labelled bindings and
to the best of our knowledge this is the first semantics that
models thread states. A set of labelled bindings corresponds to a
heap and is used to model sharing.
The semantics is set at a higher level of abstraction than an
abstract machine and is therefore more manageable for proofs about
programs rather than implementations. At the same time, it is low
level enough to allow us to reason about programs in terms of
parallelism (i.e., the number of processors used), as well as work
and run-time with different numbers of processors.
The framework used by the semantics is flexible in that the model
can easily be adapted to express other evaluation models such as
sequential call-by-need and fully-speculative evaluation,
non-deterministic choice and others.
Keywords: semantics, GpH
|
[33]
|
H.-W. Loidl, U. Klusik, K. Hammond, R. Loogen, and P. W. Trinder.
GpH and Eden: Comparing Two Parallel Functional Languages on
a Beowulf Cluster.
In SFP'00 - Scottish Functional Programming Workshop,
volume 2 of Trends in Functional Programming, pages 39-52, St Andrews,
Scotland, July 2000. Intellect.
[ bib |
.ps ]
We investigate two similar but contrasting parallel functional
language designs: Eden and GpH. Both languages use the non-strict
functional language Haskell as a core expression language, both
their implementations are based on the same host sequential
implementation - the high performance Glasgow Haskell Compiler
(GHC), and both implementations are available on the same
distributed architecture - the St Andrews Beowulf cluster. This
allows an exceptionally pure comparison of the language design
characteristics and their impact on parallel performance.
The comparison is illustrated by two parallel benchmarks which
expose differences in the communication, process creation, and
work distribution mechanisms employed by Eden and GpH. Our
results show that the explicit process model favoured by Eden
gives good parallel performance for coarse-grained applications
running on the Beowulf cluster. In comparison, the implicit
process model used in GpH gives poorer absolute speedup for this
class of application on this architecture. Further work is needed
to verify the reasons for this difference in performance and to
extend these results to other architectures.
Keywords: models of parallelism, GpH, Eden, distributed memory
|
[32]
|
R. F. Pointon, P. W. Trinder, and H.-W. Loidl.
The Design and Implementation of Glasgow Distributed
Haskell.
In IFL'00 - Intl. Workshop on the Implementation of Functional
Languages, volume 2011 of Lecture Notes in Computer Science, pages
53-70, Aachen, Germany, Sept. 2000. Springer.
[ bib |
DOI |
.ps ]
This paper presents the design and implementation of Glasgow
distributed Haskell (GdH), a non-strict distributed functional
language. The language is intended for constructing scalable,
reliable distributed applications, and is Haskell'98 compliant,
being a superset of both Concurrent Haskell and Glasgow parallel
Haskell (GpH).
GdH distributes both pure and impure threads across multiple
Processing Elements (PEs), each location is made explicit so a
program can use resources unique to PE, and objects including
threads can be created on a named PE. The location that uniquely
owns a resource is identified by a method of a new Immobile type
class. Impure threads communicate and synchronise explicitly to
co-ordinate actions on the distributed state, but both pure and
impure threads synchronise and communicate implicitly to share
data. Limited support for fault tolerant programming is provided
by distributed exception handling. The language constructs are
illustrated by example, and two demonstration programs give a
flavour of GdH programming.
Although many distributed functional languages have been designed,
relatively few have robust implementations. The GdH implementation
fuses and extends two mature implementation technologies: the GUM
runtime system (RTS) of GpH and the RTS for Concurrent
Haskell. The fused RTS is extended with a small number of
primitives from which more sophisticated constructs can be
constructed, and libraries are adapted to the distributed context.
Keywords: implementation, GdH
|
[31]
|
P. W. Trinder, H.-W. Loidl, E. Barry Jr., M. K. Davis, K. Hammond, U. Klusik,
S. L. Peyton-Jones, and Á. J. Rebón Portillo.
The Multi-Architecture Performance of the Parallel Functional
Language GpH (Research Note).
In Euro-Par 2000 - Intl. Conf. on Parallel Processing, volume
1900 of Lecture Notes in Computer Science, pages 739-743, Munich,
Germany, Aug. 2000. Springer.
[ bib |
DOI |
.html ]
In principle, functional languages promise straightforward
architecture-independent parallelism, because of their high level
description of parallelism, dynamic management of parallelism and
deterministic semantics. However, these language features come at
the expense of a sophisticated compiler and/or runtime-system. The
problem we address is whether such an elaborate system can deliver
acceptable performance on a variety of parallel architectures. In
particular we report performance measurements for the GUM
runtime-system on eight parallel architectures, including
massively parallel, distributed-memory, shared-memory and
workstation networks.
Keywords: models of parallelism, GpH
|
[30]
|
P. W. Trinder, R. F. Pointon, and H.-W. Loidl.
Runtime System Level Fault Tolerance for a Distributed
Functional Language.
In SFP'00 - Scottish Functional Programming Workshop,
volume 2 of Trends in Functional Programming, pages 103-114, St
Andrews, Scotland, July 2000. Intellect.
[ bib |
.ps ]
Functional languages potentially offer benefits for distributed
fault tolerance: many computations are pure, and hence have no
side-effects to be reversed during error recovery; moreover
functional languages have a high-level runtime system (RTS) where
computations and data are readily manipulated. We propose a new
RTS level of fault tolerance for distributed functional languages,
and outline a design for its implementation for the GdH
language. The design distinguishes between pure and impure
computations: impure computations must be recovered using
conventional exception-based techniques, but the RTS attempts
implicit recovery of pure computations.
Keywords: implementation, GdH, GUM, fault tolerance
|
[29]
|
H.-W. Loidl, P. W. Trinder, K. Hammond, S. B. Junaidu, R. G. Morgan, and S. L.
Peyton-Jones.
Engineering Parallel Symbolic Programs in GpH.
Concurrency - Practice and Experience, 11(12):701-752, 1999.
[ bib |
.ps ]
We investigate the claim that functional languages offer low-cost
parallelism in the context of symbolic programs on modest parallel
architectures. In our investigation we present the first
comparative study of the construction of large applications in a
parallel functional language, in our case in Glasgow Parallel
Haskell (GPH). The applications cover a range of application
areas, use several parallel programming paradigms, and are
measured on two very different parallel architectures.
On the applications level the most significant result is that we
are able to achieve modest wall-clock speedups (between factors of
2 and 10) over the optimised sequential versions for all but one
of the programs. Speedups are obtained even for programs that were
not written with the intention of being parallelised. These gains
are achieved with a relatively small programmer-effort. One reason
for the relative ease of parallelisation is the use of evaluation
strategies, a new parallel programming technique that separates
the algorithm from the coordination of parallel behaviour.
On the language level we show that the combination of lazy and
parallel evaluation is useful for achieving a high level of
abstraction. In particular we can describe top-level parallelism,
and also preserve module abstraction by describing parallelism
over the data structures provided at the module interface
(“data-oriented parallelism”). Furthermore, we find that the
determinism of the language is helpful, as is the largely-implicit
nature of parallelism in GPH.
Keywords: parallel applications, GpH
|
[28]
|
P. W. Trinder.
Motivation for Glasgow distributed Haskell, a non-strict
Functional Language.
In PDSIA'99 - Intl. Workshop on Parallel and Distributed
Computing for Symbolic and Irregular Applications, pages 72-81, Sendai,
Japan, July 1999. World Scientific.
[ bib |
.html ]
Non-strict functional languages offer potential benefits for
constructing distributed systems: namely a highly-dynamic model of
distribution, a relatively high degree of distribution
transparency, and the potential to abstract over
distribution-control primitives. We describe our motivation for
implementing such a language, a variant of Haskell, and evaluating
it. The implementation is a fusion of existing Glasgow Haskell
Compiler technologies. The evaluation will be based on experiences
implementing a distributed interactive simulation, and comparing
it with a Java version.
Keywords: models of parallelism, GdH
|
[27]
|
P. W. Trinder, H.-W. Loidl, and K. Hammond.
Large-Scale Functional Applications.
In K. Hammond and G. J. Michaelson, editors, Research Directions
in Parallel Functional Programming, pages 399-426. Springer, Nov. 1999.
[ bib |
http ]
This chapter describes applications that have been written using
parallel functional language technology. While the majority of
these applications have been produced as part of a computer
science research project aimed at testing the limits of the
technology, others such as those written by Ericsson in Erlang,
are being used in real commercial products. Furthermore, projects
such as Digital's Smart Kiosk application written using the
data-flow inspired Stampede system represent serious trial uses of
the technology by commercial organisations.
Functional programming is a wide-spectrum, general purpose
paradigm, and the applications therefore cover a wide area,
including high-performance numerical applications, symbolic
applications, data-intensive applications and computer vision.
Some of these areas present hard problems for traditional parallel
approaches, which are usually best suited to regular computation
patterns and regular data structures. Symbolic or irregular
applications that are described here include compilers, theorem
provers, computer algebra and natural language processing.
It is a common misconception that functional programming can only
be applied to small-scale programs, or that parallelism has only
been obtained for toy examples. This chapter describes several
real parallel applications that are large in absolute terms,
e.g. the 230000 line Erlang mobility server, or the 47000 line
Lolita natural language parser. Since functional programs are
often a factor of 5-10 shorter than the equivalent C or Fortran,
these represent serious, and realistic bases for assessing the
quality of a parallel language and its development environment.
Although, as the preceding chapters have shown, there is still
much research to be done, parallel functional programming
technology has been maturing rapidly, and many large parallel
functional applications have been produced in recent years.
Although this chapter aims to be reasonably comprehensive,it is
therefore necessarily selective in the applications that are
covered. We are particularly heartened by results from systems
such as SAC and SISAL, which are not merely competitive with the
performance of C, but have actually exceeded the performance of
equivalent Fortran programs compiled using high-quality
parallelising compilers.
The chapter uses the taxonomy that was introduced in Chapter 3.
Implicit, semi-explicit, coordination, explicit and derivational
approaches are each covered in separate sections. Within those
sections, applications are covered by area. Ideally we would like
to report a consistent set of program and parallel platform
characteristics, and performance figures for each application. In
the absence of a standard set of parallel benchmarks, such results
are currently unobtainable. Results are therefore given using
either absolute or relative speedup figures as recorded by the
authors of the research papers on particular parallel
architectures.
Keywords: parallel applications, GpH
|
[26]
|
J. G. Hall, C. A. Baker-Finch, P. W. Trinder, and D. J. King.
Towards an Operational Semantics for a Parallel Non-Strict
Functional Language.
In IFL'98 - Intl. Workshop on the Implementation of Functional
Languages, volume 1595 of Lecture Notes in Computer Science, pages
54-71, London, UK, Sept. 1998. Springer.
[ bib |
.html ]
Parallel programs must describe both computation and coordination,
i.e. what to compute and how to organise the computation. In
functional languages equational reasoning is often used to reason
about computation. In contrast, there have been many different
coordination constructs for functional languages, and far less
work on reasoning about coordination.
We present an initial semantics for GpH, a small extension of the
Haskell language, that allows us to reason about coordination. In
particular we can reason about work, average parallelism, and
runtime. The semantics captures the notions of limited (physical)
resources, the preservation of sharing and speculative
evaluation. We show a consistency result with Launchbury's
well-known lazy semantics.
Keywords: semantics, GpH
|
[25]
|
K. Hammond.
Strategic SPMD.
In Glasgow Workshop on Functional Programming, Pitlochry,
Scotland, Sept. 1998.
[ bib |
.html ]
This paper extends the existing concepts of evaluation strategies,
higher-order functions that can be uses to separate behaviour from
algorithm in parallel functional programs, to support the SPMD
model of parallelism. Our model allows SPMD sub-programs to be
composed with other parallel paradigms, or to be arbitrarily
nested with other parallel paradigms. This represents a major step
towards supporting the Group SPMD model, a generalisation of SPMD,
in Glasgow Parallel Haskell. It also represents the first attempt
of which we are aware to model SPMD programming in a system
supporting implicit communications.
In order to ensure that the necessary barrier communications are
modelled correctly at the high level we use the implementation
concept synchronisation through logically shared graph
reduction. We therefore ensure, in common with other work on
evaluation strategies, that no new constructs need be added to the
host functional language other than basic ones to introduce
parallelism and enforce sequential execution. In order to ensure
accurate modelling of the SPMD paradigm it is, however, convenient
to introduce a primitive to control task placement.
Although there is, as yet, no efficient implementation of the SPMD
strategy, we are confident that this could be constructed using an
algorithmic skeleton, or similar implementation technique.
Keywords: models of parallelism, evaluation strategies, SPMD
|
[24]
|
D. J. King, J. G. Hall, and P. W. Trinder.
A Strategic Profiler for Glasgow Parallel Haskell.
In IFL'98 - Intl. Workshop on the Implementation of Functional
Languages 1998, volume 1595 of Lecture Notes in Computer Science,
pages 88-102, London, UK, Sept. 1998. Springer.
[ bib |
.html ]
Execution profiling is a crucial part of engineering large
parallel functional programs. This paper presents the design,
implementation, and use of a new execution time profiler for
Glasgow Parallel Haskell GpH - a well established parallel
derivative of Haskell. Parallelism is introduced in GpH by using
evaluation strategies, which provide a clean way of co-ordinating
parallel threads without destroying a program's original
structure. The new profiler attributes the cost of evaluating
parallel threads to the strategies that created them. A unique
feature of the strategic profiler is that the call-graph of
evaluation strategies is maintained, allowing the programmer to
discover the sequence of (perhaps nested) strategies that were
used to create any given thread. The strategic profiler is
demonstrated on several examples, and compared with other
profilers.
Keywords: tools, GpH, profiling, evaluation strategies
|
[23]
|
H.-W. Loidl.
Granularity in Large-Scale Parallel Functional Programming.
PhD thesis, Dept. of Computing Science, Univ. of Glasgow, Mar. 1998.
[ bib |
.html ]
This thesis demonstrates how to reduce the runtime of large
non-strict functional programs using parallel evaluation. The
parallelisation of several programs shows the importance of
granularity, i.e. the computation costs of program
expressions. The aspect of granularity is studied both on a
practical level, by presenting and measuring runtime granularity
improvement mechanisms, and at a more formal level, by devising a
static granularity analysis.
By parallelising several large functional programs this thesis
demonstrates for the first time the advantages of combining lazy
and parallel evaluation on a large scale: laziness aids
modularity, while parallelism reduces runtime. One of the
parallel programs is the Lolita system which, with more than
47,000 lines of code, is the largest existing parallel non-strict
functional program. A new mechanism for parallel programming,
evaluation strategies, to which this thesis contributes, is shown
to be useful in this parallelisation. Evaluation strategies
simplify parallel programming by separating algorithmic code from
code specifying dynamic behaviour. For large programs the
abstraction provided by functions is maintained by using a
data-oriented style of parallelism, which defines parallelism over
intermediate data structures rather than inside the functions.
A highly parameterised simulator, GranSim, has been constructed
collaboratively and is discussed in detail in this thesis. GranSim
is a tool for architecture-independent parallelisation and a
testbed for implementing runtime-system features of the parallel
graph reduction model. By providing an idealised as well as an
accurate model of the underlying parallel machine, GranSim has
proven to be an essential part of an integrated parallel software
engineering environment. Several parallel runtime-system features,
such as granularity improvement mechanisms, have been tested via
GranSim. It is publicly available and in active use at several
universities worldwide.
In order to provide granularity information this thesis presents
an inference-based static granularity analysis. This analysis
combines two existing analyses, one for cost and one for size
information. It determines an upper bound for the computation
costs of evaluating an expression in a simple strict higher-order
language. By exposing recurrences during cost reconstruction and
using a library of recurrences and their closed forms, it is
possible to infer the costs for some recursive functions. The
possible performance improvements are assessed by measuring the
parallel performance of a hand-analysed and annotated program.
Keywords: parallel applications, GpH, granularity, thesis
|
[22]
|
P. W. Trinder, E. Barry Jr., M. K. Davis, K. Hammond, S. B. Junaidu,
U. Klusik, H.-W. Loidl, and S. L. Peyton-Jones.
GpH: An Architecture-independent Functional Language.
Unpublished draft, July 1998.
[ bib |
.html ]
In principle, pure functional languages promise straightforward
architecture-independent parallelism. We investigate the validity
of this claim in the context of our highly-portable implementation
of an implicitly-parallel functional language: the GUM
implementation of Glasgow Parallel Haskell (GpH). We discuss
architecture independence at two levels: low-level (i.e. the
implementation) and high-level (i.e. the programmer).
Low-level architecture independence is achieved by choosing a
message-passing model for GUM, and implementing it using portable
C and a widely-supported message-passing library like PVM. In fact
GUM is largely independent of the message-passing library, and has
been adapted to use MPI and the CM-5 CMMD libraries as well as
PVM. As a result, GUM is easily ported, and is currently available
on seven platforms including shared-memory machines, distributed
memory machines, and networks of workstations. We provide
indicative measurements of how efficient and effective our
architecture-independent runtime system is across a range of
architectures.
The GpH programming model provides higher-level architecture
independence. The parallelism in GpH is mainly implicit, and hence
relatively small parts of the program need to be changed for a new
architecture. The coordination that is required is expressed with
a new high-level construct, evaluation strategies. Evaluation
strategies provide a clean separation between algorithm and
coordination, easing the task of changing either facet to suit new
parallel environments. Moreover, GpH programs can systematically
be developed for multiple target architectures, using a suite of
simulation, profiling and visualisation tools. Much of the
development is architecture-independent but, once a particular
target architecture has been selected, the tools are parameterised
to support tuning for that architecture. We demonstrate the
systematic development of two real programs to the point of
achieving good speedups on four architectures.
Keywords: models of parallelism, GpH, architecture independence, unpublished
|
[21]
|
P. W. Trinder, E. Barry Jr., M. K. Davis, K. Hammond, S. B. Junaidu,
U. Klusik, H.-W. Loidl, and S. L. Peyton-Jones.
Low level Architecture-independence of Glasgow Parallel
Haskell (GpH).
In Glasgow Workshop on Functional Programming, Pitlochry,
Scotland, Sept. 1998.
[ bib |
.html ]
In principle, pure functional languages promise straightforward
architecture-independent parallelism. We investigate the validity
of this claim in the context of our highly-portable implementation
of an implicitly-parallel functional language: the GUM
implementation of Glasgow Parallel Haskell (GpH). We focus here on
the low-level architecture independence of GUM: the high-level
architecture independence of the GpH programming model is
discussed elsewhere.
Low-level architecture independence is achieved by choosing a
message-passing model for GUM, and implementing it using portable
C and a widely-supported message-passing library like PVM. In fact
GUM is largely independent of the message-passing library, and has
been adapted to use MPI and the CM-5 CMMD libraries as well as
PVM. As a result, GUM is easily ported, and is currently available
on seven platforms including shared-memory machines, distributed
memory machines, and networks of workstations. We provide
indicative measurements of the efficiency and speedups delivered
by GUM on a range of architectures.
Keywords: models of parallelism, GpH, architecture independence
|
[20]
|
P. W. Trinder, K. Hammond, H.-W. Loidl, and S. L. Peyton-Jones.
Algorithm + Strategy = Parallelism.
Journal of Functional Programming, 8(1):23-60, 1998.
[ bib |
DOI |
.html ]
The process of writing large parallel programs is complicated by
the need to specify both the parallel behaviour of the program and
the algorithm that is to be used to compute its result. This paper
introduces evaluation strategies, lazy higher-order functions that
control the parallel evaluation of non-strict functional
languages. Using evaluation strategies, it is possible to achieve
a clean separation between algorithmic and behavioural code. The
result is enhanced clarity and shorter parallel programs.
Evaluation strategies are a very general concept: this paper shows
how they can be used to model a wide range of commonly used
programming paradigms, including divide-and-conquer, pipeline
parallelism, producer/consumer parallelism, and data-oriented
parallelism. Because they are based on unrestricted higher-order
functions, they can also capture irregular parallel structures.
Evaluation strategies are not just of theoretical interest: they
have evolved out of our experience in parallelising several
large-scale parallel applications, where they have proved
invaluable in helping to manage the complexities of parallel
behaviour. These applications are described in detail here. The
largest application we have studied to date, Lolita, is a 60,000
line natural language parser. Initial results show that for these
programs we can achieve acceptable parallel performance, while
incurring minimal overhead for using evaluation strategies.
Keywords: models of parallelism, GpH, evaluation strategies
|
[19]
|
C. V. Hall, H.-W. Loidl, P. W. Trinder, K. Hammond, and J. T. O'Donnell.
Refining a Parallel Algorithm for Calculating Bowings.
In Glasgow Workshop on Functional Programming, Ullapool,
Scotland, Sept. 1997.
Draft proceedings.
[ bib |
.html ]
String players know that bowing properly is the hardest skill they
have to learn. In other work [FH97], we develop an algorithm that
calculates bowings for BowTech, a project that supports string
performers. This algorithm takes a significant amount of time to
execute (in our second test case, an hour for the implementation,
written in Ada). We have implemented a parallel version of the
algorithm in GpH, a parallel superset of Haskell, and measured the
quality of output on size pieces of music. The parallel program
has been refined using the GranSim simulator and measured on two
parallel architectures: a shared memory multiprocessor and a
network of workstations.
Keywords: parallel applications, GpH
|
[18]
|
K. Hammond, H.-W. Loidl, and P. W. Trinder.
Parallel Cost Centre Profiling.
In Glasgow Workshop on Functional Programming, pages 51-72,
Ullapool, Scotland, Sept. 1997.
Draft proceedings.
[ bib |
.html ]
Good profiling is a major issue in extracting performance from
parallel programs. We report on a novel synthesis of sequential
cost-centre profiling and state-of-the-art parallel simultion for
the pure functional language Haskell that promises to provide
detailed and accurate information about parallel Haskell
programs. Exploiting simulation also improves the quality of
sequential cost centre profiling, though at a significant
performance overhead.
Keywords: tools, GpH, profiling
|
[17]
|
S. B. Junaidu, A. J. T. Davie, and K. Hammond.
Naira: A Parallel^2 Haskell Compiler.
In IFL'97 - Intl. Workshop on the Implementation of Functional
Languages, volume 1467 of Lecture Notes in Computer Science, pages
214-230, St Andrews, Scotland, Sept. 1997. Springer.
[ bib |
.html ]
Naira is a compiler for a parallel dialect of Haskell, compiling
to a dataflow-inspired parallel abstract machine. Unusually
(perhaps even uniquely), Naira has itself been parallelised using
state-of-the-art tools developed at Glasgow and St. Andrews. This
paper reports initial performance results that have been obtained
using the GranSim simulator, both for the top-level pipeline and
for the individual compilation states. Our results show that a
modest but useful degree of parallelism can be achieved even for a
distributed memory machine.
Keywords: parallel applications, GpH
|
[16]
|
H.-W. Loidl.
LinSolv: a Case Study in Strategic Parallelism.
In Glasgow Workshop on Functional Programming, Ullapool,
Scotland, Sept. 1997.
Draft proceedings.
[ bib |
.html ]
This paper discusses the parallelisation and performance tuning of
a typical computer algebra algorithm, LinSolv, using evaluation
strategies. We present three steps in the parallelisation process
starting with a naive parallel version. As this algorithm uses
infinite data structures as intermediate values it is necessary to
define very sophisticated strategies in order to improve parallel
performance.
We also compare the strategic parallel code with pre-strategy
code. This comparison shows how evaluation strategies help to
localise changes needed for parallelisation. In particular, the
separation between algorithmic and parallel code makes the
structure of the parallelism much clearer.
Keywords: parallel applications, GpH
|
[15]
|
H.-W. Loidl, R. G. Morgan, P. W. Trinder, S. Poria, C. Cooper, S. L.
Peyton-Jones, and R. Garigliano.
Parallelising a Large Functional Program or: Keeping LOLITA
Busy.
In IFL'97 - Intl. Workshop on the Implementation of Functional
Languages, volume 1467 of Lecture Notes in Computer Science, pages
198-213, St Andrews, Scotland, Sept. 1997. Springer.
[ bib |
DOI |
.html ]
In this paper we report on the ongoing parallelisation of Lolita,
a natural language engineering system. Although Lolita currently
exhibits only modest parallelism, we believe that it is the
largest parallel functional program ever, comprising more than
47,000 lines of Haskell. Lolita has the following interesting
features common to real world applications of lazy languages:
* the code was not specifically designed for parallelism;
* laziness is essential for efficiency in Lolita;
* Lolita interfaces to data structures outside the Haskell heap,
using a foreign language interface;
* Lolita was not written by those most closely involved in the
parallelisation.
Our expectations in parallelising the program were to achieve
moderate speedups with small changes in the code. To date
speedups of up to 2.4 have been achieved for Lolita running under
a realistic simulation of our 4 processor shared-memory target
machine. Most notably, the parallelism is achieved with a very
small number of changes to, and without requiring an understanding
of most of the application. On the Sun SPARCserver target machine
wall-clock speedup is currently limited by physical memory
availability.
Keywords: parallel applications, GpH
|
[14]
|
H.-W. Loidl and P. W. Trinder.
Engineering Large Parallel Functional Programs.
In IFL'97 - Intl. Workshop on the Implementation of Functional
Languages, volume 1467 of Lecture Notes in Computer Science, pages
178-197, St Andrews, Scotland, Sept. 1997. Springer.
[ bib |
DOI |
.html ]
The design and implementation of useful programming languages,
whether sequential or parallel, should be driven by large,
realistic applications. In constructing several medium- and
large-scale programs in Glasgow Parallel Haskell, GPH, a parallel
extension of Haskell, the group at Glasgow has investigated
several important engineering issues:
* Real Application Parallelism: The programs achieve good
wall-clock speedups and acceptable scale-up on both a
shared-memory and a distributed memory machine. The programs
typify a number of application areas and use a number of
different parallel paradigms, e.g. pipelining or
divide-and-conquer, often combining several paradigms in a
single program.
* Language Issues: Although the largely implicit parallelism in
GPH is a satisfactory programming model in general the base
constructs for introducing and controlling parallelism tend to
obfuscate the semantics of large programs. As a result we
developed evaluation strategies, a more abstract, and
systematic mechanism for introducing and controlling
parallelism.
* Engineering Environment: The development and performance
tuning of these programs emphasised the importance of an
integrated engineering environment. In the process we have
refined components of this environment like the simulator, the
runtime system, and the profiling tools.
Keywords: parallel applications, GpH
|
[13]
|
H.-W. Loidl and K. Hammond.
A Sized Time System for a Parallel Functional Language.
In Glasgow Workshop on Functional Programming, Ullapool,
Scotland, July 1996.
[ bib |
.html ]
This paper describes an inference system, whose purpose is to
determine the cost of evaluating expressions in a strict purely
functional language. Upper bounds can be derived for both
computation cost and the size of data structures. We outline a
static analysis based on this inference system for inferring size
and cost information. The analysis is a synthesis of the sized
types of Hughes et al., and the polymorphic time system of Dornic
et al., which was extended to static dependent costs by Reistad
and Gifford.
Our main interest in cost information is for scheduling tasks in
the parallel execution of functional languages. Using the GranSim
parallel simulator, we show that the information provided by our
analysis is sufficient to characterise relative task granularities
for a simple functional program. This information can be used in
the runtime-system of the Glasgow Parallel Haskell compiler to
improve dynamic program performance.
Keywords: semantics, resource analysis
|
[12]
|
H.-W. Loidl and K. Hammond.
Making a Packet: Cost-Effective Communication for a Parallel
Graph Reducer.
In IFL'96 - Intl. Workshop on the Implementation of Functional
Languages, volume 1268 of Lecture Notes in Computer Science, pages
184-199, Bad-Godesberg, Germany, Sept. 1996. Springer.
[ bib |
.html ]
This paper studies critical runtime-system issues encountered when
packing data for transmission in a lazy, parallel graph reduction
system. In particular, we aim to answer two questions:
* How much graph should go into a packet?
* How aggressively should a processor look for work after
requesting remote data?
In order to answer the first question, we compare various packing
schemes, of which one extreme packs just the node that is demanded
(“incremental fetching”), and the other packs all the graph that
is reachable from that node (“bulk fetching”). The second
question is addressed by considering various mechanisms for
latency hiding during communication, ranging from fully
synchronous communication with no attempt to mask latency, to full
thread migration during asynchronous communication.
In order to make our results as general as possible, we have used
the GranSim simulator to study a wide variety of parallel machine
configurations. Based on these measurements we propose concrete
improvements for parallel graph reducers such as the GUM
implementation of Glasgow Parallel Haskell.
Keywords: implementation, GpH, GUM
|
[11]
|
P. W. Trinder, K. Hammond, H.-W. Loidl, S. L. Peyton-Jones, and J. Wu.
Accidents always Come in Threes: A Case Study of Data-intensive
Programs in Parallel Haskell.
In Glasgow Workshop on Functional Programming, Ullapool,
Scotland, July 1996.
[ bib |
.html ]
Accidents happen:
* "An invisible car came out of nowhere, struck my vehicle and
vanished."
* "I pulled away from the side of the road, glanced at my
mother-in-law, and headed for the embankment."
* "As I approached the intersection a sign suddenly appeared in
a place where no stop sign had ever appeared before."
Luckily, we don't normally have to deal with problems as bizarre
as these. One interesting application that does arise at the
Centre for Transport Studies consists of matching police reports
of several accidents so as to locate accident blackspots. The
application provides an interesting, data-intensive, test-bed for
the persistent functional language PFL. We report here on an
approach aimed at improving the performance of this application
using Glasgow Parallel Haskell.
The accident application is one of several large parallel Haskell
programs under development at Glasgow. Our objective is to achieve
wall-clock speedups over the best sequential implementations, and
we report modest wall-clock speedups for a demonstration
program. From experience with these and other programs the group
is developing a methodology for parallelising large functional
programs. We have also developed strategies, a mechanism to
separately specify a function's algorithm and its dynamic
behaviour.
Keywords: parallel applications, GpH
|
[10]
|
P. W. Trinder, K. Hammond, J. S. Mattson Jr., A. S. Partridge, and S. L.
Peyton-Jones.
GUM: A Portable Parallel Implementation of Haskell.
In PLDI'96 - ACM Conf. on Programming Language Design and
Implementation, pages 79-88, Philadephia, PA, USA, May 1996. ACM Press.
[ bib |
DOI |
.html ]
GUM is a portable, parallel implementation of the Haskell
functional language. Despite sustained research interest in
parallel functional programming, GUM is one of the first such
systems to be made publicly available. GUM is message-based, and
portability is facilitated by using the PVM communications harness
that is available on many multi-processors. As a result, GUM is
available for both shared-memory (Sun SPARCserver multiprocessors)
and distributed-memory (networks of workstations)
architectures. The high message-latency of distributed machines is
ameliorated by sending messages asynchronously, and by sending
large packets of related data in each message. Initial performance
figures demonstrate absolute speedups relative to the best
sequential compiler technology. To improve the performance of a
parallel Haskell program GUM provides tools for monitoring and
visualising the behaviour of threads and of processors during
execution.
Keywords: implementation, GpH, GUM
|
[9]
|
K. Hammond, H.-W. Loidl, and A. S. Partridge.
Visualising Granularity in Parallel Programs: A Graphical
Winnowing System for Haskell.
In HPFC'95 - Conf. on High Performance Functional Computing,
pages 208-221, Denver, CO, USA, Apr. 1995.
[ bib |
.html ]
To take advantage of distributed-memory parallel machines it is
essential to have good control of task granularity. This paper
describes a fairly accurate parallel simulator for Haskell, based
on the Glasgow compiler, and complementary tools for visualising
task granularities. Together these tools allow us to study the
effects of various annotations on task granularity on a variety of
simulated parallel architectures. They also provide a more precise
tool for the study of parallel execution than has previously been
available for Haskell programs.
These tools have already confirmed that thread migration is
essential in parallel systems, demonstrated a close correlation
between thread execution times and total heap allocations, and
shown that fetching data synchronously normally gives better
overall performance than asynchronous fetching, if data is fetched
on demand.
Keywords: tools, GpH, simulator, granularity
|
[8]
|
H.-W. Loidl and K. Hammond.
On the Granularity of Divide-and-Conquer Parallelism.
In Glasgow Workshop on Functional Programming, Workshops in
Computing, Ullapool, Scotland, July 1995. Springer.
[ bib |
.html |
.pdf ]
This paper studies the runtime behaviour of various parallel
divide-and-conquer algorithms written in a non-strict functional
language, when three common granularity control mechanisms are
used: a simple cut-off, a priority thread creation and a priority
scheduling mechanism. These mechanisms use granularity information
that is currently provided via annotations to improve the
performance of the parallel programs.
The programs we examine are several variants of a generic
divide-and-conquer program, an unbalanced divide-and-conquer
algorithm and a parallel determinant computation. Our results
indicate that for balanced computation trees a simple,
low-overhead mechanism performs well whereas the more complex
mechanisms offer further improvements for unbalanced computation
trees.
Keywords: models of parallelism, GpH, granularity
|
[7]
|
K. Hammond, J. S. Mattson Jr., and S. L. Peyton-Jones.
Automatic Spark Strategies and Granularity for a Parallel
Functional Language Reducer.
In CONPAR'94 - Intl. Conf. on Parallel Processing, volume 854
of Lecture Notes in Computer Science, pages 521-532, Linz, Austria,
Sept. 1994. Springer.
[ bib |
DOI |
.html ]
This paper considers the issue of dynamic task control in the
context of a parallel Haskell implementation on the GRIP
multiprocessor. For the first time, we report the effect of our
task control strategies on task granularity, as measured in terms
of dynamic heap allocations. This gives a concrete means of
measuring the effectiveness of these strategies other than
wall-clock timings, which are notoriously uninformative.
Keywords: models of parallelism, GpH, GRIP, granularity
|
[6]
|
H.-W. Loidl and K. Hammond.
GRAPH for PVM: Graph Reduction for Distributed Hardware.
In IFL'94 - Intl. Workshop on the Implementation of Functional
Languages, Norwich, UK, Sept. 1994.
[ bib |
.html ]
We describe a version of the GRAPH system (Graph Reduction for an
Assortment of Parallel Hardware) designed to execute parallel
functional programs on a range of distributed-memory MIMD machines
running the PVM communications harness. GRAPH was developed from
the runtime system for the novel GRIP multiprocessor. Although
this system has proved highly successful, being a novel
architecture it is hard to compare results with other
architectures or implementations. The CPUs used in the GRIP
design are also rather old.
The principal extensions from GRAPH for GRIP to GRAPH for PVM are
intended to handle high latencies more efficiently, by exploiting
asynchronous communication, multi-threaded scheduling, and by
grouping packets into larger entities where possible. The main
innovation is the development of new, sophisticated packet
flushing and packet fetching algorithms whose purpose is to allow
graphs to be exported to and fetched from global memory without
inter-processor synchronisation. As communication and reduction
can be interleaved, a new form of synchronising communication and
reduction is necessary.
We also give a short outlook on the design of a new portable
distributed graph reduction system, GRAPH for UMM, that adepts the
ideas and techniques originally developed for the GRIP system to a
distributed memory environment.
GRAPH for PVM has been implemented on a network of SUN
workstations and it is currently being tested and debugged.
Although no extensive performance measurements have been performed
yet, the available data shows a significant decrease in the
overall communication and thus an improvement in the overall
performance of the system.
Keywords: implementation, GpH, run-time system
|
[5]
|
K. Hammond, H.-W. Loidl, J. S. Mattson Jr., A. S. Partridge, S. L.
Peyton-Jones, and P. W. Trinder.
GRAPHing the Future.
In IFL'94 - Intl. Workshop on the Implementation of Functional
Languages, Norwich, UK, Sept. 1994.
[ bib |
.html ]
At Glasgow our research into parallel functional programming has
been moving away from our novel architecture, GRIP towards the
provision of a general parallel runtime environment. We call this
GRAPH (Graph Reduction for an Assortment of Parallel Hardware).
This paper describes the design of a new memory and load
management model for GRAPH, which is intended to match shared- and
distributed-memory machines better and to provide a framework for
our research into parallel functional databases and granularity
control. This model is currently under implementation. No
performance results can therefore be provided at the present time.
Keywords: implementation, GpH, run-time system
|
[4]
|
G. Akerholt, K. Hammond, S. L. Peyton-Jones, and P. W. Trinder.
Processing Transactions on GRIP, a Parallel Graph Reducer.
In PARLE'93 - Intl. Conf. on Parallel Architectures and
Languages Europe, volume 694 of Lecture Notes in Computer Science,
pages 634-647, Munich, Germany, June 1993. Springer.
[ bib |
DOI |
.html ]
The GRIP architecture allows efficient execution of functional
programs on a multi-processor built from standard hardware
components. State-of-the-art compilation techniques are combined
with sophisticated runtime resource-control to give good parallel
performance. This paper reports the results of running GRIP on an
application which is apparently unsuited to the basic functional
model: a database transaction manager incorporating updates as
well as lookup transactions. The results obtained show good
relative speedups for GRIP, with real performance advantages over
the same application executing on sequential machines.
Keywords: parallel applications, GpH, GRIP
|
[3]
|
K. Hammond.
Getting a GRIP.
In Intl. Workshop on the Parallel Implementation of Functional
Languages, Nijmegen, The Netherlands, Sept. 1993.
[ bib |
.html ]
This paper describes a portable implementation of the Graph
Reduction in Parallel (GRIP) software, built on the Parallel
Virtual Machine (PVM) process control system. Using this system,
it is easy for anyone to “get a GRIP” without needing to invest
in special-purpose hardware. An important contribution of this
paper is its detailed description of GRIP's Intelligent Memory
Unit (IMU) software.
Keywords: implementation, GpH, run-time system, GRIP
|
[2]
|
K. Hammond and S. L. Peyton-Jones.
Profiling Scheduling Strategies on the GRIP Multiprocessor.
In Intl. Workshop on the Parallel Implementation of Functional
Languages, pages 73-98, Aachen, Germany, Sept. 1992.
[ bib |
.html ]
It is widely claimed that functional languages are particularly
suitable for programming parallel computers. A claimed advantage
is that the programmer is not burdened with details of task
creation, placement, scheduling, and synchronisation, these
decisions being taken by the system instead. Leaving aside the
question of whether a pure functional language is expressive
enough to encompass all the parallel algorithms we might wish to
program, there remains the question of how effectively the
compiler and run-time system map the program onto a real parallel
system, a task usually carried out mostly by the programmer. This
is the question we address in this paper.
We first introduce the system architecture of GRIP, a
shared-memory parallel machine supporting an implementation of the
functional language Haskell. GRIP executes functional programs in
parallel using compiled supercombinator graph reduction, a form of
declarative rule system.
We then describe several strategies for run-time resource control
which we have tried, presenting comprehensive measurements of
their effectiveness. We are particularly concerned with strategies
controlling task creation, in order to improve task granularity
and minimise communication overheads. This is, so far as we know,
one of the first attempts to make a systematic study of
task-control strategies in a high-performance parallel
functional-language system. GRIP's high absolute performance
render these results credible for real applications.
Keywords: implementation, GpH, run-time system, GRIP
|
[1]
|
K. Hammond and S. L. Peyton-Jones.
Some Early Experiments on the GRIP Parallel Reducer.
In Intl. Workshop on the Parallel Implementation of Functional
Languages, pages 51-72, Nijmegen, The Netherlands, June 1990.
[ bib |
.html ]
GRIP is a multiprocessor designed to execute functional programs
in parallel using graph reduction. We have implemented a compiler
for GRIP, based on the Spineless Tagless G-machine, and can now
run parallel functional programs with substantial absolute speedup
over the same program running on a uniprocessor Sun.
Parallel functional programming shifts some of the burden of
resource allocation from the programmer to the system. Examples of
such decisions include: when to create a new concurrent activity
(or thread), when to execute such threads, where to execute them,
and so on.
It is clearly desirable that the system should take such
decisions, provided it does a good enough job. For example, a
paged virtual memory system almost always does an adequate job,
and a programmer very seldom has to interfere with it. The big
question for parallel functional programming is whether good
resource-allocation strategies exist, and how well they perform
under a variety of conditions.
Now that we have an operational system, we are starting to carry
out experiments to develop resource-allocation strategies, and
measure their effectiveness. This paper reports on some very
preliminary results. They mainly concern the question of when, or
even whether, to create a new thread. This is an aspect which has
so far received little attention - existing work has focused
mainly on load sharing rather than on thread creation. Our work is
aided significantly by the fact that sparks indicate only the
possibility of creating a new parallel thread, not the necessity
of creating this thread. It is therefore always safe to discard
sparks, while affecting only the performance of the computation,
not its result.
We begin with an overview of the GRIP system, and how we perform
compiled graph reduction. This is followed by a summary of the
statistics-gathering system and the results we have obtained so
far.
Keywords: implementation, GpH, run-time system, GRIP
|
|