%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% GpH BIBLIOGRAPHY

%%
%% Organisation:
%% * reverse chronological order (by year)
%% * within each year, entries are sorted by citekey
%%
%% Citekeys:
%% * are constructed according to the DBLP convention:
%%   - surname of first author, followed by
%%   - surname initial of each of the remaining authors, followed by
%%   - last two digits of year
%% * If citekeys constructed as above clash, they are disambiguated
%%   by an extra suffix (e.g. a, b, ...)
%%
%% Authors' and editors' names:
%% * To enhance uniformity, all names are defined in the preamble,
%%   keys being authors' surnames (except where there are clashes).
%%   This is slightly problematic because authors' names in the
%%   bibliography may not exactly match their names on the publications ---
%%   first names may be abbreviated, middle initials may be absent,
%%   composite names may not be hyphenated, only parts of composite
%%   names may be used. For the bibliography, I adopted the following
%%   (reasonable) 'maximum distinction' policy:
%%   - full first name (unless I could not find it anywhere)
%%   - middle name initials (unless I could not find any anywhere)
%%   - composite surnames hyphenated if they appear hyphenated on the authors'
%%     homepages or on at least one of the publications in the bibliography
%%
%%   This policy could be changed or watered down on an individual basis
%%   (so long as each author remains uniquely identifiable throughout
%%   the bibliography file).
%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% PREAMBLE

% Journals
@String{J-CPE    = "Concurrency --- Practice and Experience"}
@String{J-ENTCS  = "Electr. Notes Theor. Comput. Sci."}
@String{J-FP     = "Journal of Functional Programming"}
@String{J-HOSC   = "Higher-Order and Symbolic Computation"}
@String{J-INFSCI = "Information Sciences"}
@String{J-PPL    = "Parallel Processing Letters"}
@String{J-SCPE   = "Scalable Computing: Practice and Experience"}
@String{J-SPE    = "Software --- Practice and Experience"}
@String{J-TOPADS = "IEEE Transactions on Parallel and Distributed Systems"}
@String{J-UCS    = "Journal of Universal Computer Science"}

% Publishers
@String{P-ACM    = "ACM Press"}
@String{P-CH     = "Chapman and Hall"}
@String{P-IEEE   = "IEEE Computer Society"}
@String{P-INTELL = "Intellect"}
@String{P-SV     = "Springer"}
@String{P-WSCI   = "World Scientific"}

% Series
@String{S-LNCS = "Lecture Notes in Computer Science"}
@String{S-TFP  = "Trends in Functional Programming"}
@String{S-WC   = "Workshops in Computing"}

% Authors (and editors)
@String{and = " and "}  %% Note the spaces around 'and'.
@String{Akerholt      = "Akerholt, Gert"}
@String{Alexander     = "Alexander, Michael"}
@String{Aswad         = "Aswad, Mustafa K."}
@String{BakerFinch    = "Baker-Finch, Clement A."}
@String{Barry         = "{Barry Jr.}, Ed."}
@String{Berthold      = "Berthold, Jost"}
@String{Bois          = "Du Bois, Andr{\'e} Rauber"}
@String{Brown         = "Brown, Christopher"}
@String{Butz          = "Butz, Carsten"}
@String{Cooper        = "Cooper, Chris"}
@String{Cooperman     = "Cooperman, Gene"}
@String{Cole          = "Cole, Murray I."}
@String{Costanti      = "Costanti, Marco"}
@String{Davie         = "Davie, Antony J. T."}
@String{Davis         = "Davis, M. Kei"}
@String{Donnell       = "O'Donnell, John T."}
@String{Gardner       = "Gardner, Bill"}
@String{Garigliano    = "Garigliano, Roberto"}
@String{HallC         = "Hall, Cordelia V."}  %% surname clash
@String{HallJ         = "Hall, Jon G."}       %% surname clash
@String{Hammond       = "Hammond, Kevin"}
@String{Harris        = "Harris, Tim"}
@String{Horiguchi     = "Horiguchi, Susumu"}
@String{Horn          = "Horn, Peter"}
@String{James         = "James, Roshan P."}
@String{Janjic        = "Janjic, Vladimir"}
@String{Jones         = "{Jones Jr.}, Don"}
@String{Junaidu       = "Junaidu, Sahalu B."}
@String{King          = "King, David J."}
@String{Klusik        = "Klusik, Ulrike"}
@String{Konovalov     = "Konovalov, Alexander"}
@String{Lange         = "Lange, Martin"}
@String{Linton        = "Linton, Steve"}
@String{Loidl         = "Loidl, Hans-Wolfgang"}
@String{Loogen        = "Loogen, Rita"}
@String{Maier         = "Maier, Patrick"}
@String{Marlow        = "Marlow, Simon"}
@String{Mattson       = "{Mattson Jr.}, James S."}
@String{Michaelson    = "Michaelson, Greg J."}
@String{Morgan        = "Morgan, Richard G."}
@String{Partridge     = "Partridge, Andrew S."}
@String{Pena          = "Pe{\~n}a, Ricardo"}
@String{Petcu         = "Petcu, Dana"}
@String{PeytonJones   = "Peyton-Jones, Simon L."}
@String{Pointon       = "Pointon, Robert F."}
@String{Poria         = "Poria, Sanjay"}
@String{Priebe        = "Priebe, Steffen"}
@String{RebonPortillo = "{Reb{\'o}n Portillo}, {\'A}lvaro J."}
@String{Roozemond     = "Roozemond, Dan"}
@String{Rubio         = "Rubio, Fernando"}
@String{Sarafis       = "Sarafis, Ioannis A."}
@String{Scaife        = "Scaife, Norman"}
@String{Singh         = "Singh, Satnam"}
@String{Trinder       = "Trinder, Philip W."}
@String{Vasconcelos   = "Vasconcelos, Pedro B."}
@String{WalshKemmis   = "Walsh-Kemmis, David"}
@String{Wu            = "Wu, J."}
@String{Zain          = "Al Zain, Abdallah"}
@String{Zalzala       = "Zalzala, Ali M. S."}

%% END OF PREAMBLE
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2010

%% TODO: update url (HWL: please put paper on HW server)
@InProceedings{BrownLBH10,
  author    = Brown # and # Loidl # and # Berthold # and # Hammond,
  title     = {Improving your {CASH} flow: The {Computer} {Algebra} {SHell} (Extended Abstract)},
  booktitle = {IFL'10 --- Intl. Workshop on the Implementation of Functional Languages},
  month     = sep,
  year      = {2010},
  note      = {Draft Proceedings},
  keywords  = {parallel applications, symbolic computation},
  url       = {http://www.cs.st-andrews.ac.uk/~hwloidl/SCIEnce/SymGrid-Par/ifl2010.pdf},
  abstract  = {
    Some important challenges in the field of symbolic computation ---
    and functional programming --- are the transparent access to
    complex, mathematical software, the exchange of data between
    independent systems with specialised tasks and the exploitation of
    modern parallel hardware. One attempt to solve this problem is
    SymGrid-Par, a system for exploiting parallel hardware in the
    context of computer algebra.  Specifically, SymGrid-Par provides
    an easy-to-use platform for parallel computation that connects
    several underlying computer algebra systems, communicating through
    a standardised protocol for symbolic computation.

    In this paper we describe a new component of SymGrid-Par known as
    CaSH: the Computer Algebra SHell. CaSH is a system that allows
    direct access to SymGrid-Par via GHCi. CaSH thus allows Haskell
    programmers to exploit high-performance parallel computations
    using a system designated for solving problems in computer
    algebra; whilst still maintaining the purity and express-ability
    offered by the Haskell environment. We demonstrate access to both
    sequential and parallel services of SymGrid-Par.  For the latter
    we use parallel skeletons, implemented in the Haskell dialect of
    Eden; these skeletons are called from CaSH but exploit a
    computational algebra system known as GAP to offload the
    mathematical complexity.
  }
}


%% TODO: complete entry (add url; ask Vladimir)
@inproceedings{JanjicH10,
  author    = Janjic # and # Hammond,
  title     = {Granularity-Aware Work-Stealing for Computationally-Uniform Grids},
  booktitle = {CCGRID 2010 --- Intl. Conf. on Cluster, Cloud and Grid Computing},
  address   = {Melbourne, Australia},
  month     = may,
  year      = {2010},
  pages     = {123--134},
  keywords  = {parallel Haskell, GpH, implementation},
  doi       = {http://dx.doi.org/10.1109/CCGRID.2010.49},
  abstract  = {
    Good scheduling is important for ensuring effective use of Grid
    resources, while maximising parallel performance. In this paper,
    we show how a basic ``Random-Stealing'' load balancing algorithm
    for computational Grids can be improved by using information about
    the task granularity of parallel programs. We propose several
    strategies (SSL, SLL and LLL) for using granularity information to
    improve load balancing, presenting results both from simulations
    and from a real implementation (the Grid-GUM Runtime System for
    Parallel Haskell). We assume a common model of task creation which
    subsumes both master/worker and data-parallel programming
    paradigms under a task-stealing work distribution
    strategy. Overall, we achieve improvement in runtime of up to 19.4
    percent for irregular problems in the real implementation, and up
    to 40 percent for the simulations (typical improvements of more
    that 15 percent for irregular programs, and from 5--10 percent for
    regular ones). Our results show that, for computationally-uniform
    Grids, advanced load balancing methods that exploit granularity
    information generally have the greatest impact on reducing the
    runtimes of irregular parallel programs. Moreover, the more
    irregular the program is, the better the improvements that can be
    achieved.
  }
}


%% TODO: del entry
@InProceedings{LintonHKZTHR10,
  author    = Linton # and # Hammond # and # Konovalov # and # Zain # and # Trinder # and # Horn # and # Roozemond,
  title     = {Easy Composition of Symbolic Computation Software: A New Lingua Franca for Symbolic Computation},
  booktitle = {ISSAC 2010 --- Intl. Symp. on Symbolic and Algebraic Computation},
  address   = {Munich, Germany},
  month     = jul,
  year      = {2010},
  publisher = P-ACM,
  keywords  = {parallel applications, symbolic computation},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/ISSAC10.pdf},
  abstract  = {
    We present the results of the first four years of the European
    research project SCIEnce (www.symbolic-computation.org), which
    aims to provide key infrastructure for symbolic computation
    research. A primary outcome of the project is that we have
    developed a new way of combining computer algebra systems using
    the Symbolic Computation Software Composability Protocol (SCSCP),
    in which both protocol messages and data are encoded in the
    OpenMath format. We describe SCSCP middleware and APIs, outline
    some implementations for various Computer Algebra Systems (CAS),
    and show how SCSCP-compliant components may be combined to solve
    scientific problems that can not be solved within a single CAS, or
    may be organised into a system for distributed parallel
    computations.
  }
}


%% TODO: complete entry (after conf: add pages, remove note)
@InProceedings{MarlowMLAT10,
  author    = Marlow # and # Maier # and # Loidl # and # Aswad # and # Trinder,
  title     = {Seq no more: Better Strategies for Parallel {Haskell}},
  booktitle = {Haskell Symposium 2010},
  address   = {Baltimore, MD, USA},
  month     = sep,
  year      = {2010},
  publisher = P-ACM,
  note      = {To appear},
  keywords  = {parallel Haskell, GpH, evaluation strategies},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/new-strategies.html},
  abstract  = {
    We present a complete redesign of evaluation strategies, a key
    abstraction for specifying pure, deterministic parallelism in
    Haskell.  Our new formulation preserves the compositionality and
    modularity benefits of the original, while providing significant
    new benefits.  First, we introduce an evaluation-order monad to
    provide clearer, more generic, and more efficient specification of
    parallel evaluation.  Secondly, the new formulation resolves a
    subtle space management issue with the original strategies,
    allowing parallelism (sparks) to be preserved while reclaiming
    heap associated with superfluous parallelism.  Related to this,
    the new formulation provides far better support for speculative
    parallelism as the garbage collector now prunes unneeded
    speculation.  Finally, the new formulation provides improved
    compositionality: we can directly express parallelism embedded
    within lazy data structures, producing more compositional
    strategies, and our basic strategies are parametric in the
    coordination combinator, facilitating a richer set of parallelism
    combinators.

    We give measurements over a range of benchmarks demonstrating that
    the runtime overheads of the new formulation relative to the
    original are low, and the new strategies even yield slightly
    better speedups on average than the original strategies.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2009

%% TODO: complete entry (when book appears: add pages, remove note)
@InProceedings{AswadTZMB09,
  author    = Aswad # and # Trinder # and # Zain # and # Michaelson # and # Berthold,
  title     = {Low Pain vs No Pain Multi-core {Haskells}},
  booktitle = {TFP'09 --- Symp. on Trends in Functional Programming},
  address   = {Komarno, Slovakia},
  month     = jun,
  year      = {2009},
  publisher = P-INTELL,
  series    = S-TFP,
  volume    = {10},
  note      = {To appear},
  keywords  = {parallel Haskell},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/LowvsNoPainHaskells.pdf},
  abstract  = {
    Multicores are becoming the dominant processor technology and
    functional languages are theoretically well suited to exploit
    them. In practice, however, implementing effective high level
    parallel functional languages is extremely challenging.

    This paper is the first programming and performance comparison of
    functional multicore technologies and reports some of the first
    ever multicore results for two languages. As such it reflects the
    growing maturity of the field by systematically evaluating four
    parallel Haskell implementations on a common multicore
    architecture.  The comparison contrasts the programming effort
    each language requires with the parallel performance
    delivered. The study uses 15 'typical' programs to compare a 'no
    pain', i.e. entirely implicit, parallel language with three 'low
    pain', i.e. semi-explicit languages.

    The parallel Haskell implementations use different versions of GHC
    compiler technology, and hence the comparative performance metric
    is speedup which normalises against sequential performance. We
    ground the speedup comparisons by reporting both sequential and
    parallel runtimes and efficiencies for three of the languages. Our
    experiments focus on the number of programs improved, the absolute
    speedups delivered, the parallel scalability, and the program
    changes required to coordinate parallelism. The results are
    encouraging and, on occasion, surprising.
  }
}


%% TODO: complete entry (add url; ask Jost)
@InProceedings{BertholdMHZ09,
  author    = Berthold # and # Marlow # and # Hammond # and # Zain,
  title     = {Comparing and Optimising Parallel {Haskell} Implementations for Multicore Machines},
  booktitle = {ADPNA 2009 --- Intl. Workshop on Advanced Distributed and Parallel Network Applications},
  address   = {Vienna, Austria},
  month     = sep,
  year      = {2009},
  publisher = P-IEEE,
  pages     = {386--393},
  keywords  = {parallel Haskell, GpH, multicore},
  doi       = {http://doi.ieeecomputersociety.org/10.1109/ICPPW.2009.10},
  abstract  = {
    In this paper, we investigate the differences and tradeoffs
    imposed by two parallel Haskell dialects running on multicore
    machines. GpH and Eden are both constructed using the
    highly-optimising sequential GHC compiler, and share thread
    scheduling, and other elements, from a common code base. The GpH
    implementation investigated here uses a physically-shared heap,
    which should be well-suited to multicore architectures. In
    contrast, the Eden implementation adopts an approach that has been
    designed for use on distributed-memory parallel machines: a system
    of multiple, independent heaps (one per core), with inter-core
    communication handled by message-passing rather than through
    shared heap cells. We report two main results. Firstly, we report
    on the effect of a number of optimisations that we applied to the
    shared-memory GpH implementation in order to address some
    performance issues that were revealed by our testing: for example,
    we implemented a work-stealing approach to task allocation. Our
    optimisations improved the performance of the shared-heap GpH
    implementation by as much as 30% on eight cores. Secondly, the
    shared heap approach is, rather surprisingly, not superior to a
    distributed heap implementation: both give similar performance
    results.
  }
}


@InProceedings{JonesMS09,
  author    = Jones # and # Marlow # and # Singh,
  title     = {Parallel Performance Tuning for {Haskell}},
  booktitle = {Haskell Symposium 2009},
  address   = {Edinburgh, Scotland},
  month     = sep,
  year      = {2009},
  publisher = P-ACM,
  pages     = {81--92},
  keywords  = {parallel Haskell, GpH, profiling, multicore},
  doi       = {http://doi.acm.org/10.1145/1596638.1596649},
  url       = {http://research.microsoft.com/apps/pubs/default.aspx?id=80976},
  abstract  = {
    Parallel Haskell programming has entered the mainstream with
    support now included in GHC for multiple parallel programming
    models, along with multicore execution support in the
    runtime. However, tuning programs for parallelism is still
    something of a black art. Without much in the way of feedback
    provided by the runtime system, it is a matter of trial and error
    combined with experience to achieve good parallel speedups.

    This paper describes an early prototype of a parallel profiling
    system for multicore programming with GHC. The system comprises
    three parts: fast event tracing in the runtime, a Haskell library
    for reading the resulting trace files, and a number of tools built
    on this library for presenting the information to the
    programmer. We focus on one tool in particular, a graphical
    timeline browser called ThreadScope.

    The paper illustrates the use of ThreadScope through a number of
    case studies, and describes some useful methodologies for
    parallelizing Haskell programs.
  }
}


@InProceedings{MarlowPS09,
  author    = Marlow # and # PeytonJones # and # Singh,
  title     = {Runtime Support for Multicore {Haskell}},
  booktitle = {ICFP 2009 --- Intl. Conf. on Functional Programming},
  address   = {Edinburgh, Scotland},
  month     = aug,
  year      = {2009},
  publisher = P-ACM,
  pages     = {65--78},
  keywords  = {parallel Haskell, GpH, multicore},
  doi       = {http://doi.acm.org/10.1145/1596550.1596563},
  url       = {http://research.microsoft.com/apps/pubs/default.aspx?id=79856},
  abstract  = {
    Purely functional programs should run well on parallel hardware
    because of the absence of side effects, but it has proved hard to
    realise this potential in practice. Plenty of papers describe
    promising ideas, but vastly fewer describe real implementations
    with good wall-clock performance. We describe just such an
    implementation, and quantitatively explore some of the complex
    design tradeoffs that make such implementations hard to build. Our
    measurements are necessarily detailed and specific, but they are
    reproducible, and we believe that they offer some general
    insights.
  }
}


@InProceedings{TrinderCLM09,
  author    = Trinder # and # Cole # and # Loidl # and # Michaelson,
  title     = {Characterising Effective Resource Analyses for Parallel and Distributed Coordination},
  booktitle = {FOPARA 2009 --- Intl. Workshop on Foundational and Practical Aspects of Resource Analysis, Revised Selected Papers},
  address   = {Eindhoven, The Netherlands},
  month     = nov,
  year      = {2009},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {6324},
  pages     = {67--83},
  keywords  = {resource analysis},
  doi       = {10.1007/978-3-642-15331-0_5},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/FOPARA09.pdf},
  abstract  = {
    An important application of resource analysis is to improve the
    performance of parallel and distributed programs. In this context
    key resources are time, space and communication. Given the
    spectrum of cost models and associated analysis techniques
    available, what combination should be selected for a specific
    parallel or distributed context?

    We address the question as follows. We outline a continuum of
    coordination cost models and a range of analysis techniques. We
    consider six representative parallel/distributed applications of
    resource analysis techniques, and aim to extract general
    principles governing why the combination of techniques is
    effective in its context.
  }
}


@InProceedings{ZainHBTMA09,
  author    = Zain # and # Hammond # and # Berthold # and # Trinder # and # Michaelson # and # Aswad,
  title     = {Low-Pain, High-Gain Multicore Programming in {Haskell}: Coordinating Irregular Symbolic Computations on Multicore Architectures},
  booktitle = {DAMP'09 --- Intl. Workshop on Declarative Aspects of Multicore Programming},
  address   = {Savannah, GA, USA},
  month     = jan,
  year      = {2009},
  publisher = P-ACM,
  pages     = {25--36},
  keywords  = {parallel Haskell, symbolic computation},
  doi       = {http://doi.acm.org/10.1145/1481839.1481843},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/damp09.pdf},
  abstract  = {
    With the emergence of commodity multicore architectures,
    exploiting tightly-coupled parallelism has become increasingly
    important. Functional programming languages, such as Haskell, are,
    in principle, well placed to take advantage of this trend,
    offering the ability to easily identify large amounts of
    fine-grained parallelism. Unfortunately, obtaining real
    performance benefits has often proved hard to realise in practice.

    This paper reports on a new approach using middleware that has
    been constructed using the Eden parallel dialect of Haskell. Our
    approach is ``low pain'' in the sense that the programmer
    constructs a parallel program by inserting a small number of
    higher-order algorithmic skeletons at key points in the
    program. It is ``high gain'' in the sense that we are able to get
    good parallel speedups. Our approach is unusual in that we do not
    attempt to use shared memory directly, but rather coordinate
    parallel computations using a message-passing implementation. This
    approach has a number of advantages. Firstly, coordination,
    i.e. locking and communication, is both confined to limited shared
    memory areas, essentially the communication buffers, and is also
    isolated within well-understood libraries. Secondly, the coarse
    thread granularity that we obtain reduces coordination overheads,
    so locks are normally needed only on (relatively large) messages,
    and not on individual data items, as is often the case for simple
    shared-memory implementations. Finally, cache coherency
    requirements are reduced since individual tasks do not share
    caches, and can garbage collect independently.

    We report results for two representative computational algebra
    problems. Computational algebra is a challenging application area
    that has not been widely studied in the general parallelism
    community. Computational algebra applications have high
    computational demands, and are, in principle, often suitable for
    parallel execution, but usually display a high degree of
    irregularity in terms of both task and data structure. This makes
    it difficult to construct parallel applications that perform well
    in practice. Using our system, we are able to obtain both
    extremely good processor utilisation (97 percent) and very good
    absolute speedups (up to 7.7) on an eight-core machine.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2008

%% TODO: update url (HWL: please put paper on HW server)
@InProceedings{BertholdZL08,
  author    = Berthold # and # Zain # and # Loidl,
  title     = {Scheduling Light-Weight Parallelism in {ArTCoP}},
  booktitle = {PADL 2008 --- Intl. Symp. on Practical Aspects of Declarative Languages},
  address   = {San Francisco, CA, USA},
  month     = jan,
  year      = {2008},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {4902},
  pages     = {214--229},
  keywords  = {parallel Haskell, GpH, implementation},
  doi       = {http://dx.doi.org/10.1007/978-3-540-77442-6_15},
  url       = {http://www.tcs.informatik.uni-muenchen.de/~hwloidl/publications/PADL08.pdf},
  abstract  = {
    We present the design and prototype implementation of the
    scheduling component in ArTCoP (architecture transparent control
    of parallelism), a novel run-time environment (RTE) for parallel
    execution of high-level languages.  A key feature of ArTCoP is its
    support for deep process and memory hierarchies, shown in the
    scheduler by supporting light-weight threads.  To realise a system
    with easily exchangeable components, the system defines a
    micro-kernel, providing basic infrastructure, such as garbage
    collection. All complex RTE operations, including the handling of
    parallelism, are implemented at a separate system level.  By
    choosing Concurrent Haskell as high-level system language, we
    obtain a prototype in the form of an executable specification that
    is easier to maintain and more flexible than conventional RTEs.
    We demonstrate the flexibility of this approach by presenting
    implementations of a scheduler for light-weight threads in ArTCoP,
    based on GHC Version 6.6.
  }
}


@InCollection{LoidlTHZB08,
  author    = Loidl # and # Trinder # and # Hammond # and # Zain # and # BakerFinch,
  editor    = Alexander # and # Gardner,
  title     = {Semi-Explicit Parallel Programming in a Purely Functional Style: {GpH}},
  booktitle = {Process Algebra for Parallel and Distributed Processing: Algebraic Languages in Specification-Based Software Development},
  publisher = P-CH,
  month     = dec,
  year      = {2008},
  pages     = {47--76},
  keywords  = {parallel Haskell, GpH},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/PAPDP.pdf},
  abstract  = {
    Declarative programming languages can play an important role in
    the process of designing and implementing parallel systems. They
    bridge the gap between a high-level specification, with proven
    properties of the overall system, and the execution of the system
    on real hardware.  Efficiently exploiting parallelism on a wide
    range of architectures is a challenging task and should in our
    view be handled by a sophisticated runtime environment.  Based on
    this design philosophy we have developed and formalised Glasgow
    parallel Haskell (GpH), and implemented it as a conservative
    extension of the Glasgow Haskell Compiler.

    The high-level nature of declarative languages eases the task of
    mapping an algebraic specification down to executable code.  In
    fact, the operational components of the specification can already
    be considered an implementation, with the associated properties
    acting as assertions to the code. Based on a formal model of the
    declarative language, the validity of these properties can be
    established by manual proof, which works on a level of detail
    similar to the specification language itself.  Many operational
    aspects, usually complicating a proof of an implementation, do not
    come into the picture at this level.  Most importantly,
    unnecessary sequentialisation of the code is avoided.

    However, the goal of implicit parallelism has proven an elusive
    one. Often the automatically generated threads are too
    fine-grained to be efficient.  In other cases the
    data-dependencies between expressions prohibit the generation of a
    sufficient amount of parallelism.  Thus, we employ an approach of
    semi-explicit parallelism, where only potential parallelism has to
    be annotated in a program, and all aspects of coordination are
    delegated to the runtime environment.  A corresponding formal
    model, in the form of a structured operational semantics, handling
    pools of realised and of potential parallelism, is used to
    establish the correctness of the code employing semi-explicit
    parallelism.  The runtime environment itself is capable of
    synchronising parallel threads, using automatic blocking on data
    under evaluation, and by simulating virtual shared memory across
    networks of machines.  Being embedded into the optimised runtime
    environment for a sequential language, we achieve efficient
    execution of a high-level language, close to the original
    specification language, while minimising the programmer effort in
    parallelising the code and being scalable to large-scale
    applications that can be executed on heterogeneous networks and
    computational Grids.
  }
}


%% TODO: complete entry (add url: ask Simon)
@InProceedings{MarlowHJP08,
  author    = Marlow # and # Harris # and # James # and # PeytonJones,
  title     = {Parallel Generational-Copying Garbage Collection with a Block-structured Heap},
  booktitle = {ISMM 2008 --- Intl. Symp. on Memory Management},
  address   = {Tucson, AZ, USA},
  month     = jun,
  year      = {2008},
  publisher = P-ACM,
  pages     = {11--20},
  keywords  = {parallel Haskell, GpH, implementation, multicore},
  doi       = {http://doi.acm.org/10.1145/1375634.1375637},
  abstract  = {
    We present a parallel generational-copying garbage collector
    implemented for the Glasgow Haskell Compiler. We use a
    blockstructured memory allocator, which provides a natural
    granularity for dividing the work of GC between many threads,
    leading to a simple yet effective method for parallelising copying
    GC. The results are encouraging: we demonstrate wall-clock
    speedups of on average a factor of 2 in GC time on a commodity
    4-core machine with no programmer intervention, compared to our
    best sequential GC.
  }
}


%% TODO: del entry
%% Note: publisher, pages missing; abstract strangely ends mid-sentence
@InProceedings{ZainBHT08,
  author    = Zain # and # Berthold # and # Hammond # and # Trinder,
  title     = {Orchestrating Production Computer Algebra Components into Portable Parallel Programs},
  booktitle = {Open Source Grid and Cluster Conference},
  address   = {Oakland, CA, USA},
  month     = may,
  year      = {2008},
  keywords  = {parallel applications, symbolic computation},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/OSGC.pdf},
  abstract  = {
    This paper demonstrates that it is possible to obtain good,
    scalable parallel performance by coordinating multiple instances
    of unaltered sequential computational algebra systems in order to
    deliver a single parallel system. The paper presents the first
    substantial parallel performance results for SymGrid-Par, a system
    that orchestrates computational algebra components into a
    high-performance parallel application.  We show that SymGrid-Par
    is capable of exploiting different parallel/multicore
    architectures without any change to the computational algebra
    component. Ultimately, our intention is to extend our system so
    that it is capable of orchestrating heterogeneous computations
    across a high-performance computational Grid. For now, we
    illustrate our approach with a single, unmodified production
    computational algebra system, GAP, running on two common commodity
    architectures --- a homogeneous cluster and an eight-core system.

    Computational algebra applications are large, specialised, and
    symbolic, rather than the more commonly studied numerical
    applications. They also exhibit high levels of irregularity, and
    multiple levels of irregularity.  We demonstrate that for three
    small but representative algebraic computations, good parallel
    speedup is possible relative to a sequential GAP system running on
    a single processor/core. We compare the performance of the
    orchestrated system with that of parGAP, an established parallel
    implementation of GAP, demonstrating
  }
}


%% TODO: del entry
@InProceedings{ZainHLMT08,
  author    = Zain # and # Hammond # and # Linton # and # Michaelson # and # Trinder,
  title     = {{SCIEnce}: Using High-Level Parallel Programming Technology to achieve Heterogeneous Symbolic Computing on the Grid (Poster)},
  booktitle = {UK e-Science 2008 All Hands Meeting},
  address   = {Edinburgh, UK},
  month     = sep,
  year      = {2008},
  keywords  = {parallel applications, symbolic computation},
  url       = {http://www.allhands.org.uk/2008/talks/1104.pdf}
}


@Article{ZainTML08,
  author    = Zain # and # Trinder # and # Michaelson # and # Loidl,
  title     = {Evaluating a High-Level Parallel Language ({GpH}) for Computational {GRIDs}},
  journal   = J-TOPADS,
  volume    = {19},
  number    = {2},
  year      = {2008},
  pages     = {219--233},
  keywords  = {parallel Haskell, GpH},
  doi       = {http://doi.ieeecomputersociety.org/10.1109/TPDS.2007.70728},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/TPDS.pdf},
  abstract  = {
    Computational Grids potentially offer low cost, readily available,
    and large-scale high-performance platforms. For the parallel
    execution of programs, however, computational Grids pose serious
    challenges: they are heterogeneous, and have hierarchical and
    often shared interconnects, with high and variable latencies
    between clusters.

    This paper investigates whether a programming language with
    high-level parallel coordination and a Distributed Shared Memory
    model (DSM) can deliver good, and scalable, performance on a range
    of computational Grid configurations. The high-level language,
    Glasgow parallel Haskell (GpH), abstracts over the architectural
    complexities of the computational Grid, and we have developed
    GridGUM2, a sophisticated grid-specific implementation of GpH, to
    produce the first high-level DSM parallel language implementation
    for computational Grids.

    We report a systematic performance evaluation of GridGUM2 on
    combinations of high/low and homo/hetero-geneous computational
    Grids. We measure the performance of a small set of kernel
    parallel programs representing a variety of application areas, two
    parallel paradigms, and ranges of communication degree and
    parallel irregularity. We investigate GridGUM2's performance
    scalability on medium-scale heterogeneous and high-latency
    computational Grids, and analyse the performance with respect to
    the program characteristics of communication frequency and degree
    of irregular parallelism.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2007

@InProceedings{HammondZCPT07,
  author    = Hammond # and # Zain # and # Cooperman # and # Petcu # and # Trinder,
  title     = {{SymGrid}: A Framework for Symbolic Computation on the Grid},
  booktitle = {Euro-Par 2007 --- Intl. Conf. on Parallel Processing},
  address   = {Rennes, France},
  month     = aug,
  year      = {2007},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {4641},
  pages     = {457--466},
  keywords  = {parallel applications, implementation, symbolic computation},
  doi       = {http://dx.doi.org/10.1007/978-3-540-74466-5_49},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/europar2007.pdf},
  abstract  = {
    This paper introduces the design of SymGrid, a new Grid framework
    that will, for the first time, allow multiple invocations of
    symbolic computing applications to interact via the Grid. SymGrid
    is designed to support the specific needs of symbolic computation,
    including computational steering (greater interactivity), complex
    data structures, and domain-specific computational patterns (for
    irregular parallelism). A key issue is heterogeneity: SymGrid is
    designed to orchestrate components from different symbolic systems
    into a single coherent (possibly parallel) Grid application,
    building on the OpenMath standard for data exchange between
    mathematically-oriented applications. The work is being developed
    as part of a major EU infrastructure project.
  }
}


%% TODO: del entry
@InProceedings{PetcuHTZ07,
  author    = Petcu # and # Hammond # and # Trinder # and # Zain,
  title     = {{SymGrid}: Symbolic Computations on Grids},
  booktitle = {EGEE User Forum --- Enabling Grids for E-sciencE},
  address   = {Manchester, UK},
  month     = may,
  year      = {2007},
  keywords  = {parallel applications, symbolic computation}
}


@InProceedings{ZainHTLLC07,
  author    = Zain # and # Hammond # and # Trinder # and # Linton # and # Loidl # and # Costanti,
  title     = {{SymGrid-Par}: Designing a Framework for Executing Computational Algebra Systems on Computational Grids},
  booktitle = {PAPP 2007 --- Intl. Workshop on Practical Aspects of High-level Parallel Programming},
  address   = {Beijing, China},
  month     = may,
  year      = {2007},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {4488},
  pages     = {617--624},
  keywords  = {parallel applications, implementation, symbolic computation},
  doi       = {http://dx.doi.org/10.1007/978-3-540-72586-2_90},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/papp2007.pdf},
  abstract  = {
    SymGrid-Par is a new framework for executing large computer
    algebra problems on computational Grids. We present the design of
    SymGrid-Par, which supports multiple computer algebra packages,
    and hence provides the novel possibility of composing a system
    using components from different packages.  Orchestration of the
    components on the Grid is provided by a Grid-enabled parallel
    Haskell (GpH).  We present a prototype implementation of a core
    component of SymGrid-Par, together with promising measurements of
    two programs on a modest Grid to demonstrate the feasibility of
    our approach.
  }
}


%% TODO: complete entry (add url; ask Abyd for pdf)
@InProceedings{ZainTLM07,
  author    = Zain # and # Trinder # and # Loidl # and # Michaelson,
  title     = {Supporting High-Level Grid Parallel Programming: the Design and Implementation of {Grid-GUM2} (Full Paper)},
  booktitle = {UK e-Science 2007 All Hands Meeting},
  address   = {Nottingham, UK},
  month     = sep,
  year      = {2007},
  keywords  = {parallel applications, symbolic computation}
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2006

%% TODO: del entry
@Article{BoisTL06,
  author    = Bois # and # Trinder # and # Loidl,
  title     = {Strong Mobility in Mobile {Haskell}},
  journal   = J-UCS,
  volume    = {12},
  number    = {7},
  year      = {2006},
  pages     = {868--884},
  keywords  = {mobile Haskell},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/strongm.pdf},
  abstract  = {
    In a mobile language, computations can move between locations in a
    network to better utilise resources, e.g., as in a computational
    GRID. Mobile Haskell, or mHaskell, is a small extension of
    Concurrent Haskell that enables the construction of distributed
    mobile software by introducing higher order communication channels
    called Moble Channels (MChannels). mHaskell only provides weak
    mobility, i.e. the ability to start new computations on remote
    locations. This paper shows how strong mobility, i.e. the ability
    to migrate running threads between locations, can be implemented
    in a language like mHaskell with weak mobility, higher-order
    channels and first-class continuations. Using Haskell's high level
    features, such as higher-order functions, type classes and support
    for monadic programming, strong mobility is achieved without any
    changes to the runtime system, or built-in support for
    continuations. Strong mobility is illustrated with examples and a
    mobile agent case study.
  }
}


@Article{ZainTLM06,
  author    = Zain # and # Trinder # and # Loidl # and # Michaelson,
  title     = {Managing Heterogeneity in a Grid Parallel {Haskell}},
  journal   = J-SCPE,
  volume    = {7},
  number    = {3},
  year      = {2006},
  pages     = {9--25},
  note      = {Selected papers from PAPP 2005 --- Intl. Workshop on Practical Aspects of High-level Parallel Programming, Atlanta, GA, USA, May 2005},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/PAPP-Journal.pdf},
  abstract  = {
    Computational Grids potentially offer cheap large-scale
    high-performance systems, but are a very challenging architecture,
    being heterogeneous, shared and hierarchical. Rather than
    requiring a programmer to explicitly manage this complex
    environment, we recommend using a high-level parallel functional
    language, like GpH, with largely automatic management of parallel
    coordination.

    We present GRID-GUM, an initial port of the distributed virtual
    shared-memory implementation of GpH for computational Grids. We
    show that, GRID-GUM delivers acceptable speedups on relatively low
    latency homogeneous and heterogeneous computational
    Grids. Moreover, we find that for heterogeneous computational
    Grids, load management limits performance.

    We present the initial design of GRID-GUM2, that incorporates new
    load management mechanisms that cheaply and effectively combine
    static and dynamic information to adapt to heterogeneous
    Grids. The mechanisms are evaluated by measuring four non-trivial
    programs with different parallel properties. The measurements show
    that the new mechanisms improve load distribution over the
    original implementation, reducing runtime by factors ranging from
    17 to 57 percent, and the greatest improvement is obtained for the
    most dynamic program.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2005

%% TODO: del entry
@Article{BoisTL05a,
  author    = Bois # and # Trinder # and # Loidl,
  title     = {{mHaskell}: Mobile Computation in a Purely Functional Language},
  journal   = J-UCS,
  volume    = {11},
  number    = {7},
  year      = {2005},
  pages     = {1234--1254},
  note      = {Selected papers from the SBLP'05: 9th Brazilian Symposium on Programming Languages, Recife, Brazil, May 2005},
  keywords  = {mobile Haskell},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/SBLP05.pdf},
  abstract  = {
    This paper is a complete description of mHaskell, an extension of
    Concurrent Haskell for mobile computation. We describe new
    stateful mobility primitives that use higher-order channels,
    giving their operational semantics and an implementation
    outline. We show how medium-level coordination abstractions can be
    constructed using monadic composition of the mobility primitives.
    We briefly outline how high-level mobile coordination
    abstractions, or mobility skeletons, can be defined using the
    lower-level abstractions. The use of all three abstractions is
    demonstrated with examples and a new case study: a distributed
    stateless web server where a thread farm skeleton is used to
    distribute work to remote locations.
  }
}


%% TODO: del entry
@Article{BoisTL05b,
  author    = Bois # and # Trinder # and # Loidl,
  title     = {Towards Mobility Skeletons},
  journal   = J-PPL,
  volume    = {15},
  number    = {3},
  year      = {2005},
  pages     = {273--288},
  keywords  = {mobile Haskell},
  doi       = {http://dx.doi.org/10.1142/S0129626405002210},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/ppll.pdf},
  abstract  = {
    In mobile languages programmers control the placement of code or
    computations in open networks, e.g., a program can migrate between
    locations. Mobile computations are typically stateful and interact
    with the state at each location in the network. We propose
    mobility skeletons, a library of parameterisable functions that
    encapsulate common patterns of mobile computation. Mobility
    skeletons are analogous to, but very different from, algorithmic
    skeletons --- parameterisable functions encapsulating common
    patterns of parallel computation. We have identified three common
    patterns of mobile computation, and implemented them as a library
    of higher-order functions in mobile Haskell. Each mobility
    skeleton is defined and illustrated with an example. We show how
    mobility skeletons can be composed and nested, and illustrate
    their use in a non-trivial case study: a distributed meeting
    planner. Mobility skeletons are extensible: there is a small set
    of mobility primitives, and medium-level abstractions such as
    remote evaluation can be defined using them. New mobility
    skeletons can be defined using the medium and low level
    abstractions.
  }
}


@InProceedings{HarrisMP05,
  author    = Harris # and # Marlow # and # PeytonJones,
  title     = {{Haskell} on a Shared-Memory Multiprocessor},
  booktitle = {Haskell Workshop 2005},
  address   = {Tallinn, Estonia},
  month     = sep,
  year      = {2005},
  publisher = P-ACM,
  pages     = {49--61},
  keywords  = {parallel Haskell, GpH, implementation, multicore},
  doi       = {http://doi.acm.org/10.1145/1088348.1088354},
  url       = {http://research.microsoft.com/apps/pubs/default.aspx?id=67424},
  abstract  = {
    Multi-core processors are coming, and we need ways to program
    them. The combination of purely-functional programming and
    explicit, monadic threads, communicating using transactional
    memory, looks like a particularly promising way to do so. This
    paper describes a full-scale implementation of shared-memory
    parallel Haskell, based on the Glasgow Haskell Compiler. Our main
    technical contribution is a lock-free mechanism for evaluating
    shared thunks that eliminates the major performance bottleneck in
    parallel evaluation of a lazy language. Our results are
    preliminary but promising: we can demonstrate wall-clock speedups
    of a serious application (GHC itself), even with only two
    processors, compared to the same application compiled for a
    uni-processor.
  }
}


%% TODO: update url (HWL: please put paper on HW server)
@article{LangeL05,
  author    = Lange # and # Loidl,
  title     = {Parallel and Symbolic Model Checking for Fixpoint Logic with Chop},
  journal   = J-ENTCS,
  volume    = {128},
  number    = {3},
  year      = {2005},
  pages     = {125--138},
  note      = {Selected Papers from PDMC'04 --- Intl. Workshop on Parallel and Distributed Techniques in Verification, London, UK, Sep 2004},
  keywords  = {parallel Haskell, GpH, parallel applications},
  doi       = {http://dx.doi.org/10.1016/j.entcs.2004.10.023},
  url       = {http://www.tcs.informatik.uni-muenchen.de/~hwloidl/publications/pdmc04.ps.gz},
  abstract  = {
    We consider the model checking problem for FLC, a modal fixpoint
    logic capable of defining non-regular properties. This paper
    presents a refinement of a symbolic model checker and discusses
    how to parallelise this algorithm. It reports on a prototype
    implementation of the algorithm in Glasgow Parallel Haskell (GpH)
    and its performance on a cluster of workstations.
  }
}

%% TODO: del entry (preliminiary version of ZainTLM06)
@InProceedings{ZainTLM05,
  author    = Zain # and # Trinder # and # Loidl # and # Michaelson,
  title     = {Managing Heterogeneity in a Grid Parallel {Haskell}},
  booktitle = {PAPP 2005 --- Intl. Workshop on Practical Aspects of High-level Parallel Programming},
  address   = {Atlanta, GA, USA},
  month     = may,
  year      = {2005},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {3515},
  pages     = {746--754},
  keywords  = {parallel Haskell, GpH, implementation},
  doi       = {http://dx.doi.org/10.1007/11428848_96},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/PAPP05.ps},
  abstract  = {
    Grid-GUM is a distributed virtual shared-memory implementation of
    a high-level parallel language for computational Grids. While the
    implementation delivers good speedups on multiple homogeneous
    clusters with low-latency interconnect, on heterogeneous clusters,
    however, poor load balance limits performance. Here we present new
    load management mechanisms that combine static and partial dynamic
    information to adapt to heterogeneous Grids. The mechanisms are
    evaluated by measuring four non-trivial programs with different
    parallel properties, and show runtime improvements between 17 and
    57 percent, with the most dynamic program giving the greatest
    improvement.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2004


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2003

%% TODO: del entry
%% Note: publisher, abstract missing
@InProceedings{BoisTL03a,
  author    = Bois # and # Trinder # and # Loidl,
  title     = {Towards a Mobile {Haskell}},
  booktitle = {WFLP 2003 --- Intl. Workshop on Functional and (Constraint) Logic Programming},
  address   = {Valencia, Spain},
  month     = jun,
  year      = {2003},
  pages     = {113--116},
  keywords  = {mobile Haskell},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/wflp03.ps}
}


%% TODO: del entry
@InProceedings{BoisTL03b,
  author    = Bois # and # Trinder # and # Loidl,
  title     = {Implementing Mobile {Haskell}},
  booktitle = {TFP'03 --- Symp. on Trends in Functional Programming},
  address   = {Edinburgh, Scotland},
  month     = sep,
  year      = {2003},
  publisher = P-INTELL,
  series    = S-TFP,
  volume    = {4},
  pages     = {79--94},
  keywords  = {mobile Haskell},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/TFP03-mhaskell.pdf},
  abstract  = {
    Mobile computation enables computations to move between a dynamic
    set of locations, and is becoming an increasingly important
    paradigm.  mHaskell is an extension of Haskell that supports
    mobile computation in open distributed systems i.e. a system where
    multiple executing programs can interact using a predefined
    protocol. This paper outlines the mHaskell primitives, discusses
    the design and pragmatics of their implementation and includes
    preliminary performance comparisons with Jocaml. The
    implementation addresses several challenges, including
    serialisation of programs in a lazy language with sharing, and
    using a combination of bytecode and machine code to manage the
    common software base, i.e. to determine what to communicate
    between locations.
  }
}


@Article{LoidlRSHHKLMPPRT03,
  author    = Loidl # and # Rubio # and # Scaife # and # Hammond # and # Horiguchi # and # Klusik # and # Loogen # and # Michaelson # and # Pena # and # Priebe # and # RebonPortillo # and # Trinder,
  title     = {Comparing Parallel Functional Languages: Programming and Performance},
  journal   = J-HOSC,
  volume    = {16},
  number    = {3},
  year      = {2003},
  pages     = {203--251},
  keywords  = {models of parallelism, parallel applications},
  doi       = {http://dx.doi.org/10.1023/A:1025641323400},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/HOSC03.ps.gz},
  abstract  = {
    This paper presents a practical evaluation and comparison of three
    state-of-the-art parallel functional languages. The evaluation is
    based on implementations of three typical symbolic computation
    programs, with performance measured on a Beowulf-class parallel
    architecture.

    We assess three mature parallel functional languages: PMLS, a
    system for implicitly parallel execution of ML programs; GPH, a
    mainly implicit parallel extension of Haskell; and Eden, a more
    explicit parallel extension of Haskell designed for both
    distributed and parallel execution. While all three languages
    employ a completely implicit approach to communication, each
    language takes a different approach to specifying and controlling
    parallelism, ranging from explicit identification of processes as
    language constructs (Eden) through annotation of potential
    parallelism (GPH) to automatic detection of parallel skeletons in
    sequential code (PMLS).

    We present detailed performance measurements of all three systems
    on a widely available parallel architecture: a Beowulf cluster of
    low-cost commodity workstations. We use three representative
    symbolic applications: a matrix multiplication algorithm, an exact
    linear system solver, and a simple ray-tracer. Our results show
    how moderate speedups can be achieved with little or no changes to
    the sequential code, and that parallel performance can be
    significantly improved even within our high-level model of
    parallel functional programming by controlling key aspects of the
    program such as load distribution and thread granularity.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2002

%% TODO: del entry
@InProceedings{SarafisZT02,
  author    = Sarafis # and # Zalzala # and # Trinder,
  title     = {A Genetic Rule-based Data Clustering Toolkit},
  booktitle = {CEC'02 --- Congress on Evolutionary Computation},
  address   = {Honolulu, HI, USA},
  month     = may,
  year      = {2002},
  publisher = P-IEEE,
  pages     = {1238--1243},
  keywords  = {parallel Haskell, GpH, parallel applications},
  doi       = {http://dx.doi.org/10.1109/CEC.2002.1004420},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/CEC02paper3final.pdf},
  abstract  = {
    Clustering is a hard combinatorial problem and is defined as the
    unsupervised classification of patterns. The formation of clusters
    is based on the principle of maximizing the similarity between
    objects of the same cluster while simultaneously minimizing the
    similarity between objects belonging to distinct clusters. This
    paper presents a tool for database clustering using a rule-based
    genetic algorithm (RBCGA). RBCGA evolves individuals consisting of
    a fixed set of clustering rules, where each rule includes d
    non-binary intervals, one for each feature. The investigations
    attempt to alleviate certain drawbacks related to the classical
    minimization of square-error criterion by suggesting a flexible
    fitness function which takes into consideration, cluster
    asymmetry, density, coverage and homogeny.
  }
}


@InProceedings{BoisPLT02,
  author    = Bois # and # Pointon # and # Loidl # and # Trinder,
  title     = {Implementing Declarative Parallel Bottom-Avoiding Choice},
  booktitle = {SBAC-PAD 2002 --- Symp. on Computer Architecture and High Performance Computing},
  address   = {Vitoria, Brazil},
  month     = oct,
  year      = {2002},
  publisher = P-IEEE,
  pages     = {82--92},
  keywords  = {parallel Haskell, GpH, models of parallelism},
  doi       = {http://doi.ieeecomputersociety.org/10.1109/CAHPC.2002.1180763},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/pad02.ps.gz},
  abstract  = {
    Non-deterministic choice supports efficient parallel speculation,
    but unrestricted non-determinism destroys the referential
    transparency of purely-declarative languages by removing
    unfoldability and it bears the danger of wasting resources on
    unncessary computations. While numerous choice mechanisms have
    been proposed that preserve unfoldability, and some concurrent
    implementations exist, we believe that no compiled parallel
    implementation has previously been constructed. This paper
    presents the design, semantics, implementation and use of a family
    of bottom-avoiding choice operators for Glasgow parallel
    Haskell. The subtle semantic properties of our choice operations
    are described, including a careful classification using an
    existing framework, together with a discussion of operational
    semantics issues and the pragmatics of distributed memory
    implementation. The expressiveness of our choice operators is
    demonstrated by constructing a branch and bound search, a merge
    and a speculative conditional. Their effectiveness is demonstrated
    by comparing the parallel performance of the speculative search
    with naive and 'perfect' implementations. Their efficiency is
    assessed by measuring runtime overhead and heap consumption.
  }
}


@InProceedings{BoisLT02,
  author    = Bois # and # Loidl # and # Trinder,
  title     = {Thread Migration in a Parallel Graph Reducer},
  booktitle = {IFL'02 ---  Intl. Workshop on the Implementation of Functional Languages},
  address   = {Madrid, Spain},
  month     = sep,
  year      = {2002},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {2670},
  pages     = {199--214},
  keywords  = {parallel Haskell, GpH, implementation, GUM},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/Migration-IFL02.ps.gz},
  abstract  = {
    To support high level coordination, parallel functional languages
    need effective and automatic work distribution mechanisms. Many
    implementations distribute potential work, i.e. sparks or
    closures, but there is good evidence that the performance of
    certain classes of program can be improved if current work, or
    threads, are also distributed. Migrating a thread incurs
    significant execution cost and requires careful scheduling and an
    elaborate implementation.

    This paper describes the design, implementation and performance of
    thread migration in the GUM runtime system underlying Glasgow
    parallel Haskell (GpH). Measurements of nontrivial programs on a
    highlatency cluster architecture show that thread migration can
    improve the performance of data-parallel and divide-and-conquer
    programs with low processor utilisation. Thread migration also
    reduces the variation in performance results obtained in separate
    executions of a program. Moreover, migration does not incur
    significant overheads if there are no migratable threads, or on a
    single processor. However, for programs that already exhibit good
    processor utilisation, migration may increase performance
    variability and very occasionally reduce performance.
  }
}


@Article{JunaiduT02,
  author    = Junaidu # and # Trinder,
  title     = {Parallelising large irregular programs: an experience with {Naira}},
  journal   = J-INFSCI,
  volume    = {140},
  number    = {3--4},
  year      = {2002},
  pages     = {229--240},
  note      = {Title of pre-print differs from published article},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/MeasuringNaira.ps.gz},
  abstract  = {
    Naira is a compiler for Haskell, written in Glasgow parallel
    Haskell. It exhibits modest, but irregular, parallelism that is
    determined by properties of the program being compiled, e.g. the
    complexity of the types and of the pattern matching. We report
    four experiments into Naira's parallel behaviour using a set of
    realistic inputs: namely the 18 Haskell modules of Naira
    itself. The issues investigated are:

      * Does increasing input size improve sequential efficiency and
        speedup?

      * To what extent do high communications latencies reduce average
        parallelism and speedup?

      * Does migrating running threads between processors improve
        average parallelism and speedup at all latencies?
  }
}


@InProceedings{Loidl02,
  author    = Loidl,
  title     = {The Virtual Shared Memory Performance of a Parallel Graph Reducer},
  booktitle = {CCGrid 2002 --- Intl. Symp. on Cluster Computing and the Grid},
  address   = {Berlin, Germany},
  month     = may,
  year      = {2002},
  publisher = P-IEEE,
  pages     = {311--318},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/dsm02.html},
  abstract  = {
    This paper assesses the costs of maintaining a virtual shared heap
    in our parallel graph reducer (GUM), which implements a parallel
    functional language. GUM performs automatic and dynamic resource
    management for both work and data. We introduce extensions to the
    original design of GUM, aiming at a more flexible memory
    management and communication mechanism to deal with high-latency
    systems. We then present measurements of running GUM on a Beowulf
    cluster, evaluating the overhead of dynamic distributed memory
    management and the effectiveness of the new memory management and
    communication mechanisms.
  }
}


%% TODO: HWL please add abstract
@InProceedings{RebonPortilloHLV02,
  author    = RebonPortillo # and # Hammond # and # Loidl # and # Vasconcelos,
  title     = {Cost Analysis Using Automatic Size and Time Inference},
  booktitle = {IFL'02 ---  Intl. Workshop on the Implementation of Functional Languages},
  address   = {Madrid, Spain},
  month     = sep,
  year      = {2002},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {2670},
  pages     = {232--248},
  keywords  = {parallel Haskell, GpH, models of parallelism, resource analysis},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/Analysis-IFL02.ps.gz}
}


@Article{TrinderLP02,
  author    = Trinder # and # Loidl # Pointon,
  title     = {Parallel and Distributed {Haskells}},
  journal   = J-FP,
  volume    = {12},
  number    = {4{\&}5},
  year      = {2002},
  pages     = {469--510},
  keywords  = {parallel Haskell, distributed Haskell, GpH, GdH, models of parallelism},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/jfp.html},
  abstract  = {
    Parallel and distributed languages specify computations on
    multiple processors and have a computation language to describe
    the algorithm, i.e. what to compute, and a coordination language
    to describe how to organise the computations across the
    processors. Haskell has been used as the computation language for
    a wide variety of parallel and distributed languages, and this
    paper is a comprehensive survey of implemented languages. We
    outline parallel and distributed language concepts and classify
    Haskell extensions using them. Similar example programs are used
    to illustrate and contrast the coordination languages, and the
    comparison is facilitated by the common computation language. A
    lazy language is not an obvious choice for parallel or distributed
    computation, and we address the question of why Haskell is a
    common functional computation language.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2001

@InProceedings{Loidl01,
  author    = Loidl,
  title     = {Load Balancing in a Parallel Graph Reducer},
  booktitle = {SFP'01 --- Scottish Functional Programming Workshop},
  address   = {Stirling, Scotland},
  month     = aug,
  year      = {2001},
  publisher = P-INTELL,
  series    = S-TFP,
  volume    = {3},
  pages     = {63--74},
  keywords  = {parallel Haskell, GpH, implementation, GUM},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/sfp01.html},
  abstract  = {
    Parallel graph reducers such as GUM use dynamic techniques to
    manage resources during execution. One important aspect of the
    dynamic behaviour is the distribution of work. The load balancing
    mechanism, which controls this aspect, should be flexible, to
    adjust the distribution of work to hardware characteristics as
    well as dynamic program characteristics, and scalable, to achieve
    high utilisation of all processors even on massively parallel
    machines.

    In this paper we study the behaviour of GUM's load balancing
    mechanism on a high-latency Beowulf multi-processor. We present
    modifications to the basic load balancing mechanism and discuss
    runtime measurements, which indicate that these modifications can
    significantly enhance the scalability of the system.
  }
}


@Article{LoidlTB01,
  author    = Loidl # and # Trinder # and # Butz,
  title     = {Tuning Task Granularity and Data Locality of Data Parallel {GpH} Programs},
  journal   = J-PPL,
  volume    = {11},
  number    = {4},
  year      = {2001},
  pages     = {471--486},
  keywords  = {parallel Haskell, GpH, models of parallelism},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/hlpp01.html},
  note      = {Selected papers from HLPP'01 --- Intl. Workshop on High-level Parallel Programming and Applications, Orleans, France, 26--27 March, 2001},
  abstract  = {
    The performance of data parallel programs often hinges on two key
    coordination aspects: the computational costs of the parallel
    tasks relative to their management overhead - task granularity;
    and the communication costs induced by the distance between tasks
    and their data - data locality. In data parallel programs both
    granularity and locality can be improved by clustering,
    i.e. arranging for parallel tasks to operate on related
    sub-collections of data.

    The GpH parallel functional language automatically manages most
    coordination aspects, but also allows some high-level control of
    coordination using evaluation strategies. We study the
    coordination behavior of two typical data parallel programs, and
    find that while they can be improved by introducing clustering
    evaluation strategies, further performance improvements can be
    achieved by restructuring the program.

    We introduce a new generic Cluster class that allows clustering to
    be systematically introduced, and improved by program
    transformation. In contrast to many other parallel program
    transformation approaches, we transform realistic programs and
    report performance results on a 32-processor Beowulf cluster. The
    cluster class is highly-generic and extensible, amenable to
    reasoning, and avoids conflating computation and coordination
    aspects of the program.
  }
}


@InProceedings{PointonPLLT01,
  author    = Pointon # and # Priebe # and # Loidl # and # Loogen # and # Trinder,
  title     = {Functional vs Object-Oriented Distributed Languages},
  booktitle = {EUROCAST'01 --- Intl. Conf. on Computer Aided Systems Theory},
  address   = {Las Palmas de Gran Canaria, Spain},
  month     = feb,
  year      = {2001},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {2178},
  pages     = {642--656},
  keywords  = {distributed Haskell, GdH},
  doi       = {http://dx.doi.org/10.1007/3-540-45654-6_49},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/eurocast01.ps.gz},
  abstract  = {
    Conventional distributed programming languages require the
    programmer to explicitly specify many aspects of distributed
    coordination, including resource location, task placement,
    communication and synchronisation. Functional languages aim to
    provide higher-level abstraction, and this paper investigates the
    effectiveness of this for distributed co-ordination. The
    investigation contrasts and compares contrasts Java and two
    Haskell-based distributed functional languages, Eden and
    GdH. Three distributed programs are used as case studies, and the
    performance and programming effort are reported.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 2000

@InProceedings{BakerFinchKT00,
  author    = BakerFinch # and # King # and # Trinder,
  title     = {An Operational Semantics for Parallel Lazy Evaluation},
  booktitle = {ICFP'00 --- Intl. Conf. on Functional Programming},
  address   = {Montreal, Canada},
  month     = sep,
  year      = {2000},
  publisher = P-ACM,
  pages     = {162--173},
  keywords  = {parallel Haskell, GpH, semantics},
  doi       = {http://doi.acm.org/10.1145/351240.351256},
  url       = {http://www.macs.hw.ac.uk/~trinder/papers/ICFPFinal.pdf},
  abstract  = {
    We present an operational semantics for parallel lazy evaluation
    that accurately models the parallel behaviour the non-strict
    parallel functional language GpH. Parallelism is modelled
    synchronously, that is, single reductions are carried out
    separately, then combined before proceeding onto the next set of
    reductions. Consequently the semantics has two levels, with
    single-step transition rules at one level and combining rules at
    the other. Parallel threads are modelled by labelled bindings and
    to the best of our knowledge this is the first semantics that
    models thread states. A set of labelled bindings corresponds to a
    heap and is used to model sharing.

    The semantics is set at a higher level of abstraction than an
    abstract machine and is therefore more manageable for proofs about
    programs rather than implementations. At the same time, it is low
    level enough to allow us to reason about programs in terms of
    parallelism (i.e., the number of processors used), as well as work
    and run-time with different numbers of processors.

    The framework used by the semantics is flexible in that the model
    can easily be adapted to express other evaluation models such as
    sequential call-by-need and fully-speculative evaluation,
    non-deterministic choice and others.
  }
}


%% TODO: del entry
@Article{HammondKLRT00,
  author    = Hammond # and # King # and # Loidl # and # RebonPortillo # and # Trinder,
  title     = {The {HasPar} Performance Evaluation Suite for {GpH}: a Parallel Non-Strict Functional Language},
  journal   = J-SPE,
  month     = feb,
  year      = {2000},
  note      = {Submitted for publication},
  keywords  = {parallel Haskell, GpH, profiling},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/spe.html},
  abstract  = {
    The ultimate purpose of parallel computation is to improve
    performance by exploiting hardware duplication. In order to
    achieve this improvement, it is essential to have a good
    understanding of real parallel behaviour. This paper introduces
    the HasPar integrated suite of performance evaluation tools for
    Glasgow Parallel Haskell (GpH), a high-performance parallel
    non-strict functional language. This suite provides a framework
    for assessing and improving parallel program performance that has
    been used successfully on a number of large functional programs.

    The HasPar suite includes both idealised and realistic simulators
    for GpH. It also incorporates an instrumented parallel
    implementation that can be used on a range of architectures
    including both tightly-coupled multiprocessors and loosely-coupled
    networks of workstations. An important feature of the tools is
    that they allow costs to be attributed to the parallel program
    source using either static or dynamic cost attribution mechanisms,
    as appropriate. The resulting performance profiles can be
    visualised in a number of different ways, as illustrated in this
    paper.
  }
}


@InProceedings{LoidlKHLT00,
  author    = Loidl # and # Klusik # and # Hammond # and # Loogen # and # Trinder,
  title     = {{GpH} and {Eden}: Comparing Two Parallel Functional Languages on a Beowulf Cluster},
  booktitle = {SFP'00 --- Scottish Functional Programming Workshop},
  address   = {St Andrews, Scotland},
  month     = jul,
  year      = {2000},
  publisher = P-INTELL,
  series    = S-TFP,
  volume    = {2},
  pages     = {39--52},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/sfp00.ps.gz},
  abstract  = {
    We investigate two similar but contrasting parallel functional
    language designs: Eden and GpH.  Both languages use the non-strict
    functional language Haskell as a core expression language, both
    their implementations are based on the same host sequential
    implementation --- the high performance Glasgow Haskell Compiler
    (GHC), and both implementations are available on the same
    distributed architecture --- the St Andrews Beowulf cluster.  This
    allows an exceptionally pure comparison of the language design
    characteristics and their impact on parallel performance.

    The comparison is illustrated by two parallel benchmarks which
    expose differences in the communication, process creation, and
    work distribution mechanisms employed by Eden and GpH.  Our
    results show that the explicit process model favoured by Eden
    gives good parallel performance for coarse-grained applications
    running on the Beowulf cluster.  In comparison, the implicit
    process model used in GpH gives poorer absolute speedup for this
    class of application on this architecture.  Further work is needed
    to verify the reasons for this difference in performance and to
    extend these results to other architectures.
  }
}


@InProceedings{PointonTL00,
  author    = Pointon # and # Trinder # and # Loidl,
  title     = {The Design and Implementation of {Glasgow} Distributed {Haskell}},
  booktitle = {IFL'00  --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {Aachen, Germany},
  month     = sep,
  year      = {2000},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {2011},
  pages     = {53--70},
  keywords  = {distributed Haskell, GdH, implementation},
  doi       = {http://dx.doi.org/10.1007/3-540-45361-X_4},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/ifl00.ps.gz},
  abstract  = {
    This paper presents the design and implementation of Glasgow
    distributed Haskell (GdH), a non-strict distributed functional
    language. The language is intended for constructing scalable,
    reliable distributed applications, and is Haskell'98 compliant,
    being a superset of both Concurrent Haskell and Glasgow parallel
    Haskell (GpH).

    GdH distributes both pure and impure threads across multiple
    Processing Elements (PEs), each location is made explicit so a
    program can use resources unique to PE, and objects including
    threads can be created on a named PE. The location that uniquely
    owns a resource is identified by a method of a new Immobile type
    class. Impure threads communicate and synchronise explicitly to
    co-ordinate actions on the distributed state, but both pure and
    impure threads synchronise and communicate implicitly to share
    data. Limited support for fault tolerant programming is provided
    by distributed exception handling. The language constructs are
    illustrated by example, and two demonstration programs give a
    flavour of GdH programming.

    Although many distributed functional languages have been designed,
    relatively few have robust implementations. The GdH implementation
    fuses and extends two mature implementation technologies: the GUM
    runtime system (RTS) of GpH and the RTS for Concurrent
    Haskell. The fused RTS is extended with a small number of
    primitives from which more sophisticated constructs can be
    constructed, and libraries are adapted to the distributed context.
  }
}


@InProceedings{TrinderLBDHKPR00,
  author    = Trinder # and # Loidl # and # Barry # and # Davis # and # Hammond # and # Klusik # and # PeytonJones # and # RebonPortillo,
  title     = {The Multi-Architecture Performance of the Parallel Functional Language {GpH} (Research Note)},
  booktitle = {Euro-Par 2000 --- Intl. Conf. on Parallel Processing},
  address   = {Munich, Germany},
  month     = aug,
  year      = {2000},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1900},
  pages     = {739--743},
  keywords  = {parallel Haskell, GpH, models of parallelism},
  doi       = {http://dx.doi.org/10.1007/3-540-44520-X_101},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/europar00.html},
  abstract  = {
    In principle, functional languages promise straightforward
    architecture-independent parallelism, because of their high level
    description of parallelism, dynamic management of parallelism and
    deterministic semantics.  However, these language features come at
    the expense of a sophisticated compiler and/or runtime-system. The
    problem we address is whether such an elaborate system can deliver
    acceptable performance on a variety of parallel architectures.  In
    particular we report performance measurements for the GUM
    runtime-system on eight parallel architectures, including
    massively parallel, distributed-memory, shared-memory and
    workstation networks.
  }
}


@InProceedings{TrinderPL00,
  author    = Trinder # and # Pointon # and # Loidl,
  title     = {Runtime System Level Fault Tolerance for a Distributed Functional Language},
  booktitle = {SFP'00 --- Scottish Functional Programming Workshop},
  address   = {St Andrews, Scotland},
  month     = jul,
  year      = {2000},
  publisher = P-INTELL,
  series    = S-TFP,
  volume    = {2},
  pages     = {103--114},
  keywords  = {distributed Haskell, GdH, implementation, GUM},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/sfp00a.ps.gz},
  abstract  = {
    Functional languages potentially offer benefits for distributed
    fault tolerance: many computations are pure, and hence have no
    side-effects to be reversed during error recovery; moreover
    functional languages have a high-level runtime system (RTS) where
    computations and data are readily manipulated. We propose a new
    RTS level of fault tolerance for distributed functional languages,
    and outline a design for its implementation for the GdH
    language. The design distinguishes between pure and impure
    computations: impure computations must be recovered using
    conventional exception-based techniques, but the RTS attempts
    implicit recovery of pure computations.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1999

@Article{LoidlTHJMP99,
  author    = Loidl # and # Trinder # and # Hammond # and # Junaidu # and # Morgan # and # PeytonJones,
  title     = {Engineering Parallel Symbolic Programs in {GpH}},
  journal   = J-CPE,
  volume    = {11},
  number    = {12},
  year      = {1999},
  pages     = {701--752},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/ps/cpe.ps.gz},
  abstract  = {
    We investigate the claim that functional languages offer low-cost
    parallelism in the context of symbolic programs on modest parallel
    architectures. In our investigation we present the first
    comparative study of the construction of large applications in a
    parallel functional language, in our case in Glasgow Parallel
    Haskell (GPH). The applications cover a range of application
    areas, use several parallel programming paradigms, and are
    measured on two very different parallel architectures.

    On the applications level the most significant result is that we
    are able to achieve modest wall-clock speedups (between factors of
    2 and 10) over the optimised sequential versions for all but one
    of the programs. Speedups are obtained even for programs that were
    not written with the intention of being parallelised. These gains
    are achieved with a relatively small programmer-effort. One reason
    for the relative ease of parallelisation is the use of evaluation
    strategies, a new parallel programming technique that separates
    the algorithm from the coordination of parallel behaviour.

    On the language level we show that the combination of lazy and
    parallel evaluation is useful for achieving a high level of
    abstraction. In particular we can describe top-level parallelism,
    and also preserve module abstraction by describing parallelism
    over the data structures provided at the module interface
    (``data-oriented parallelism'').  Furthermore, we find that the
    determinism of the language is helpful, as is the largely-implicit
    nature of parallelism in GPH.
  }
}


@InProceedings{Trinder99,
  author    = Trinder,
  title     = {Motivation for {Glasgow} distributed {Haskell}, a non-strict Functional Language},
  booktitle = {PDSIA'99 --- Intl. Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications},
  address   = {Sendai, Japan},
  month     = jul,
  year      = {1999},
  publisher = P-WSCI,
  pages     = {72--81},
  keywords  = {distributed Haskell, GdH, models of parallelism},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/pdsia99.html},
  abstract  = {
    Non-strict functional languages offer potential benefits for
    constructing distributed systems: namely a highly-dynamic model of
    distribution, a relatively high degree of distribution
    transparency, and the potential to abstract over
    distribution-control primitives. We describe our motivation for
    implementing such a language, a variant of Haskell, and evaluating
    it. The implementation is a fusion of existing Glasgow Haskell
    Compiler technologies. The evaluation will be based on experiences
    implementing a distributed interactive simulation, and comparing
    it with a Java version.
  }
}


@InCollection{TrinderLH99,
  author    = Trinder # and # Loidl # and # Hammond,
  editor    = Hammond # and # Michaelson,
  title     = {Large-Scale Functional Applications},
  booktitle = {Research Directions in Parallel Functional Programming},
  publisher = P-SV,
  month     = nov,
  year      = {1999},
  pages     = {399--426},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www-fp.dcs.st-and.ac.uk/pfpbook/},
  abstract  = {
    This chapter describes applications that have been written using
    parallel functional language technology.  While the majority of
    these applications have been produced as part of a computer
    science research project aimed at testing the limits of the
    technology, others such as those written by Ericsson in Erlang,
    are being used in real commercial products.  Furthermore, projects
    such as Digital's Smart Kiosk application written using the
    data-flow inspired Stampede system represent serious trial uses of
    the technology by commercial organisations.

    Functional programming is a wide-spectrum, general purpose
    paradigm, and the applications therefore cover a wide area,
    including high-performance numerical applications, symbolic
    applications, data-intensive applications and computer vision.
    Some of these areas present hard problems for traditional parallel
    approaches, which are usually best suited to regular computation
    patterns and regular data structures.  Symbolic or irregular
    applications that are described here include compilers, theorem
    provers, computer algebra and natural language processing.

    It is a common misconception that functional programming can only
    be applied to small-scale programs, or that parallelism has only
    been obtained for toy examples.  This chapter describes several
    real parallel applications that are large in absolute terms,
    e.g. the 230000 line Erlang mobility server, or the 47000 line
    Lolita natural language parser.  Since functional programs are
    often a factor of 5-10 shorter than the equivalent C or Fortran,
    these represent serious, and realistic bases for assessing the
    quality of a parallel language and its development environment.

    Although, as the preceding chapters have shown, there is still
    much research to be done, parallel functional programming
    technology has been maturing rapidly, and many large parallel
    functional applications have been produced in recent years.
    Although this chapter aims to be reasonably comprehensive,it is
    therefore necessarily selective in the applications that are
    covered.  We are particularly heartened by results from systems
    such as SAC and SISAL, which are not merely competitive with the
    performance of C, but have actually exceeded the performance of
    equivalent Fortran programs compiled using high-quality
    parallelising compilers.

    The chapter uses the taxonomy that was introduced in Chapter 3.
    Implicit, semi-explicit, coordination, explicit and derivational
    approaches are each covered in separate sections.  Within those
    sections, applications are covered by area.  Ideally we would like
    to report a consistent set of program and parallel platform
    characteristics, and performance figures for each application.  In
    the absence of a standard set of parallel benchmarks, such results
    are currently unobtainable.  Results are therefore given using
    either absolute or relative speedup figures as recorded by the
    authors of the research papers on particular parallel
    architectures.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1998

@InProceedings{HallBTK98,
  author    = HallJ # and # BakerFinch # and # Trinder # and # King,
  title     = {Towards an Operational Semantics for a Parallel Non-Strict Functional Language},
  booktitle = {IFL'98 --- Intl. Workshop on the Implementation  of Functional Languages},
  address   = {London, UK},
  month     = sep,
  year      = {1998},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1595},
  pages     = {54--71},
  keywords  = {parallel Haskell, GpH, semantics},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/semantics.html},
  abstract  = {
    Parallel programs must describe both computation and coordination,
    i.e. what to compute and how to organise the computation. In
    functional languages equational reasoning is often used to reason
    about computation. In contrast, there have been many different
    coordination constructs for functional languages, and far less
    work on reasoning about coordination.

    We present an initial semantics for GpH, a small extension of the
    Haskell language, that allows us to reason about coordination. In
    particular we can reason about work, average parallelism, and
    runtime. The semantics captures the notions of limited (physical)
    resources, the preservation of sharing and speculative
    evaluation. We show a consistency result with Launchbury's
    well-known lazy semantics.
  }
}


%% TODO: check entry (keywords)
@InProceedings{Hammond98,
  author    = Hammond,
  title     = {Strategic {SPMD}},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Pitlochry, Scotland},
  month     = sep,
  year      = {1998},
  keywords  = {parallel Haskell, GpH, models of parallelism},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/strategic-spmd.html},
  abstract  = {
    This paper extends the existing concepts of evaluation strategies,
    higher-order functions that can be uses to separate behaviour from
    algorithm in parallel functional programs, to support the SPMD
    model of parallelism. Our model allows SPMD sub-programs to be
    composed with other parallel paradigms, or to be arbitrarily
    nested with other parallel paradigms. This represents a major step
    towards supporting the Group SPMD model, a generalisation of SPMD,
    in Glasgow Parallel Haskell. It also represents the first attempt
    of which we are aware to model SPMD programming in a system
    supporting implicit communications.

    In order to ensure that the necessary barrier communications are
    modelled correctly at the high level we use the implementation
    concept synchronisation through logically shared graph
    reduction. We therefore ensure, in common with other work on
    evaluation strategies, that no new constructs need be added to the
    host functional language other than basic ones to introduce
    parallelism and enforce sequential execution. In order to ensure
    accurate modelling of the SPMD paradigm it is, however, convenient
    to introduce a primitive to control task placement.

    Although there is, as yet, no efficient implementation of the SPMD
    strategy, we are confident that this could be constructed using an
    algorithmic skeleton, or similar implementation technique.
  }
}


%% TODO: del entry
@Unpublished{HammondW98,
  author    = Hammond # and # WalshKemmis,
  title     = {{Juaskell}: Implementing Evaluation Strategies in {Java}},
  month     = nov,
  year      = {1998},
  note      = {Submitted to CLaPF'99 (3rd Latin-American Conf. on Functional Programming), Recife, Brazil, March 8, 1999},
  keywords  = {parallel Haskell, GpH, models of parallelism, unpublished},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/juaskell.html},
  abstract  = {
    We report on a novel attempt to incorporate functional language
    technology in the widely used object-oriented language
    Java. Juaskell takes the concept of evaluation strategies, which
    are designed to cleanly separate control from algorithm in a
    parallel setting, from Glasgow Parallel Haskell and re-implements
    this technology in Java. While Java does have rudimentary support
    for concurrent threads, experience has shown that the standard
    mechanism is difficult to exploit correctly, and error-prone in
    use. It is thus highly desirable to provide high-level behavioural
    abstractions over threads such as evaluation strategies.

    The implementation of Juaskell involves finding a number of
    analogies to common functional language concepts and
    implementation techniques in an object-oriented setting, notably
    higher-order functions, parametric polymorphism and lazy
    evaluation. Solutions to some of these problems proved to be more
    complex than expected, due to deficiencies in the Java language
    definition. The resulting Juaskell system has been used to
    implement a multi-threaded version of the Mandelbrot
    algorithm. For this program, which involves the creation of a
    large number of threads, performance overhead due to strategic
    code amounts to 9.6 percent of the total execution time, an
    acceptably low figure.
  }
}


@InProceedings{KingHT98,
  author    = King # and # HallJ # and # Trinder,
  title     = {A Strategic Profiler for {Glasgow} Parallel {Haskell}},
  booktitle = {IFL'98 --- Intl. Workshop on the Implementation  of Functional Languages 1998},
  address   = {London, UK},
  month     = sep,
  year      = {1998},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1595},
  pages     = {88--102},
  keywords  = {parallel Haskell, GpH, profiling, evaluation strategies},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/gransp.html},
  abstract  = {
    Execution profiling is a crucial part of engineering large
    parallel functional programs. This paper presents the design,
    implementation, and use of a new execution time profiler for
    Glasgow Parallel Haskell GpH --- a well established parallel
    derivative of Haskell. Parallelism is introduced in GpH by using
    evaluation strategies, which provide a clean way of co-ordinating
    parallel threads without destroying a program's original
    structure. The new profiler attributes the cost of evaluating
    parallel threads to the strategies that created them. A unique
    feature of the strategic profiler is that the call-graph of
    evaluation strategies is maintained, allowing the programmer to
    discover the sequence of (perhaps nested) strategies that were
    used to create any given thread. The strategic profiler is
    demonstrated on several examples, and compared with other
    profilers.
  }
}


@PhdThesis{Loidl98,
  author    = Loidl,
  title     = {Granularity in Large-Scale Parallel Functional Programming},
  school    = {Dept. of Computing Science, Univ. of Glasgow},
  month     = mar,
  year      = {1998},
  keywords  = {parallel Haskell, GpH, misc, thesis},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/loidl-thesis.html},
  abstract  = {
    This thesis demonstrates how to reduce the runtime of large
    non-strict functional programs using parallel evaluation. The
    parallelisation of several programs shows the importance of
    granularity, i.e. the computation costs of program
    expressions. The aspect of granularity is studied both on a
    practical level, by presenting and measuring runtime granularity
    improvement mechanisms, and at a more formal level, by devising a
    static granularity analysis.

    By parallelising several large functional programs this thesis
    demonstrates for the first time the advantages of combining lazy
    and parallel evaluation on a large scale: laziness aids
    modularity, while parallelism reduces runtime.  One of the
    parallel programs is the Lolita system which, with more than
    47,000 lines of code, is the largest existing parallel non-strict
    functional program.  A new mechanism for parallel programming,
    evaluation strategies, to which this thesis contributes, is shown
    to be useful in this parallelisation.  Evaluation strategies
    simplify parallel programming by separating algorithmic code from
    code specifying dynamic behaviour. For large programs the
    abstraction provided by functions is maintained by using a
    data-oriented style of parallelism, which defines parallelism over
    intermediate data structures rather than inside the functions.

    A highly parameterised simulator, GranSim, has been constructed
    collaboratively and is discussed in detail in this thesis. GranSim
    is a tool for architecture-independent parallelisation and a
    testbed for implementing runtime-system features of the parallel
    graph reduction model. By providing an idealised as well as an
    accurate model of the underlying parallel machine, GranSim has
    proven to be an essential part of an integrated parallel software
    engineering environment. Several parallel runtime-system features,
    such as granularity improvement mechanisms, have been tested via
    GranSim.  It is publicly available and in active use at several
    universities worldwide.

    In order to provide granularity information this thesis presents
    an inference-based static granularity analysis.  This analysis
    combines two existing analyses, one for cost and one for size
    information. It determines an upper bound for the computation
    costs of evaluating an expression in a simple strict higher-order
    language. By exposing recurrences during cost reconstruction and
    using a library of recurrences and their closed forms, it is
    possible to infer the costs for some recursive functions.  The
    possible performance improvements are assessed by measuring the
    parallel performance of a hand-analysed and annotated program.
  }
}


@Unpublished{TrinderBDHJKLP98a,
  author    = Trinder # and # Barry # and # Davis # and # Hammond # and # Junaidu # and # Klusik # and # Loidl # and # PeytonJones,
  title     = {{GpH}: An Architecture-independent Functional Language},
  month     = jul,
  year      = {1998},
  keywords  = {parallel Haskell, GpH, models of parallelism, unpublished},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/arch-indep.html},
  abstract  = {
    In principle, pure functional languages promise straightforward
    architecture-independent parallelism. We investigate the validity
    of this claim in the context of our highly-portable implementation
    of an implicitly-parallel functional language: the GUM
    implementation of Glasgow Parallel Haskell (GpH). We discuss
    architecture independence at two levels: low-level (i.e. the
    implementation) and high-level (i.e. the programmer).

    Low-level architecture independence is achieved by choosing a
    message-passing model for GUM, and implementing it using portable
    C and a widely-supported message-passing library like PVM. In fact
    GUM is largely independent of the message-passing library, and has
    been adapted to use MPI and the CM-5 CMMD libraries as well as
    PVM. As a result, GUM is easily ported, and is currently available
    on seven platforms including shared-memory machines, distributed
    memory machines, and networks of workstations. We provide
    indicative measurements of how efficient and effective our
    architecture-independent runtime system is across a range of
    architectures.

    The GpH programming model provides higher-level architecture
    independence. The parallelism in GpH is mainly implicit, and hence
    relatively small parts of the program need to be changed for a new
    architecture. The coordination that is required is expressed with
    a new high-level construct, evaluation strategies. Evaluation
    strategies provide a clean separation between algorithm and
    coordination, easing the task of changing either facet to suit new
    parallel environments. Moreover, GpH programs can systematically
    be developed for multiple target architectures, using a suite of
    simulation, profiling and visualisation tools. Much of the
    development is architecture-independent but, once a particular
    target architecture has been selected, the tools are parameterised
    to support tuning for that architecture. We demonstrate the
    systematic development of two real programs to the point of
    achieving good speedups on four architectures.
  }
}


%% TODO: check entry (keywords)
@InProceedings{TrinderBDHJKLP98b,
  author    = Trinder # and # Barry # and # Davis # and # Hammond # and # Junaidu # and # Klusik # and # Loidl # and # PeytonJones,
  title     = {Low level Architecture-independence of {Glasgow} Parallel {Haskell} ({GpH})},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Pitlochry, Scotland},
  month     = sep,
  year      = {1998},
  keywords  = {parallel Haskell, GpH, models of parallelism},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/low-level-gph.html},
  abstract  = {
    In principle, pure functional languages promise straightforward
    architecture-independent parallelism. We investigate the validity
    of this claim in the context of our highly-portable implementation
    of an implicitly-parallel functional language: the GUM
    implementation of Glasgow Parallel Haskell (GpH). We focus here on
    the low-level architecture independence of GUM: the high-level
    architecture independence of the GpH programming model is
    discussed elsewhere.

    Low-level architecture independence is achieved by choosing a
    message-passing model for GUM, and implementing it using portable
    C and a widely-supported message-passing library like PVM. In fact
    GUM is largely independent of the message-passing library, and has
    been adapted to use MPI and the CM-5 CMMD libraries as well as
    PVM. As a result, GUM is easily ported, and is currently available
    on seven platforms including shared-memory machines, distributed
    memory machines, and networks of workstations. We provide
    indicative measurements of the efficiency and speedups delivered
    by GUM on a range of architectures.
  }
}


@Article{TrinderHLP98,
  author    = Trinder # and # Hammond # and # Loidl # and # PeytonJones,
  title     = {{Algorithm} + {Strategy} = {Parallelism}},
  journal   = J-FP,
  volume    = {8},
  number    = {1},
  year      = {1998},
  pages     = {23--60},
  keywords  = {parallel Haskell, GpH, models of parallelism, evaluation strategies},
  doi       = {http://dx.doi.org/10.1017/S0956796897002967},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/strategies.html},
  abstract  = {
    The process of writing large parallel programs is complicated by
    the need to specify both the parallel behaviour of the program and
    the algorithm that is to be used to compute its result. This paper
    introduces evaluation strategies, lazy higher-order functions that
    control the parallel evaluation of non-strict functional
    languages. Using evaluation strategies, it is possible to achieve
    a clean separation between algorithmic and behavioural code. The
    result is enhanced clarity and shorter parallel programs.

    Evaluation strategies are a very general concept: this paper shows
    how they can be used to model a wide range of commonly used
    programming paradigms, including divide-and-conquer, pipeline
    parallelism, producer/consumer parallelism, and data-oriented
    parallelism.  Because they are based on unrestricted higher-order
    functions, they can also capture irregular parallel structures.

    Evaluation strategies are not just of theoretical interest: they
    have evolved out of our experience in parallelising several
    large-scale parallel applications, where they have proved
    invaluable in helping to manage the complexities of parallel
    behaviour.  These applications are described in detail here. The
    largest application we have studied to date, Lolita, is a 60,000
    line natural language parser. Initial results show that for these
    programs we can achieve acceptable parallel performance, while
    incurring minimal overhead for using evaluation strategies.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1997

@InProceedings{HallLTHD97,
  author    = HallC # and # Loidl # and # Trinder # and # Hammond # and # Donnell,
  title     = {Refining a Parallel Algorithm for Calculating Bowings},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = sep,
  year      = {1997},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/bowings.html},
  note      = {Draft proceedings},
  abstract  = {
    String players know that bowing properly is the hardest skill they
    have to learn. In other work [FH97], we develop an algorithm that
    calculates bowings for BowTech, a project that supports string
    performers. This algorithm takes a significant amount of time to
    execute (in our second test case, an hour for the implementation,
    written in Ada). We have implemented a parallel version of the
    algorithm in GpH, a parallel superset of Haskell, and measured the
    quality of output on size pieces of music. The parallel program
    has been refined using the GranSim simulator and measured on two
    parallel architectures: a shared memory multiprocessor and a
    network of workstations.
  }
}


@InProceedings{HammondLT97,
  author    = Hammond # and # Loidl # and # Trinder,
  title     = {Parallel Cost Centre Profiling},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = sep,
  year      = {1997},
  pages     = {51--72},
  keywords  = {parallel Haskell, GpH, profiling},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/grancc.html},
  note      = {Draft proceedings},
  abstract  = {
    Good profiling is a major issue in extracting performance from
    parallel programs. We report on a novel synthesis of sequential
    cost-centre profiling and state-of-the-art parallel simultion for
    the pure functional language Haskell that promises to provide
    detailed and accurate information about parallel Haskell
    programs. Exploiting simulation also improves the quality of
    sequential cost centre profiling, though at a significant
    performance overhead.
  }
}


@InProceedings{JunaiduDH97,
  author    = Junaidu # and # Davie # and # Hammond,
  title     = {{Naira}: A Parallel^2 {Haskell} Compiler},
  booktitle = {IFL'97 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {St Andrews, Scotland},
  month     = sep,
  year      = {1997},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1467},
  pages     = {214--230},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/naira.html},
  abstract  = {
    Naira is a compiler for a parallel dialect of Haskell, compiling
    to a dataflow-inspired parallel abstract machine. Unusually
    (perhaps even uniquely), Naira has itself been parallelised using
    state-of-the-art tools developed at Glasgow and St. Andrews. This
    paper reports initial performance results that have been obtained
    using the GranSim simulator, both for the top-level pipeline and
    for the individual compilation states. Our results show that a
    modest but useful degree of parallelism can be achieved even for a
    distributed memory machine.
  }
}


@InProceedings{Loidl97,
  author    = Loidl,
  title     = {{LinSolv}: a Case Study in Strategic Parallelism},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = sep,
  year      = {1997},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/linsolv.html},
  note      = {Draft proceedings},
  abstract  = {
    This paper discusses the parallelisation and performance tuning of
    a typical computer algebra algorithm, LinSolv, using evaluation
    strategies. We present three steps in the parallelisation process
    starting with a naive parallel version. As this algorithm uses
    infinite data structures as intermediate values it is necessary to
    define very sophisticated strategies in order to improve parallel
    performance.

    We also compare the strategic parallel code with pre-strategy
    code. This comparison shows how evaluation strategies help to
    localise changes needed for parallelisation. In particular, the
    separation between algorithmic and parallel code makes the
    structure of the parallelism much clearer.
  }
}


@InProceedings{LoidlMTPCPG97,
  author    = Loidl # and # Morgan # and # Trinder # and # Poria # and # Cooper # and # PeytonJones # and # Garigliano,
  title     = {Parallelising a Large Functional Program or: Keeping {LOLITA} Busy},
  booktitle = {IFL'97 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {St Andrews, Scotland},
  month     = sep,
  year      = {1997},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1467},
  pages     = {198--213},
  keywords  = {parallel Haskell, GpH, parallel applications},
  doi       = {http://dx.doi.org/10.1007/BFb0055432},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/lolita.html},
  abstract  = {
    In this paper we report on the ongoing parallelisation of Lolita,
    a natural language engineering system. Although Lolita currently
    exhibits only modest parallelism, we believe that it is the
    largest parallel functional program ever, comprising more than
    47,000 lines of Haskell. Lolita has the following interesting
    features common to real world applications of lazy languages:

      * the code was not specifically designed for parallelism; 

      * laziness is essential for efficiency in Lolita; 

      * Lolita interfaces to data structures outside the Haskell heap,
        using a foreign language interface;

      * Lolita was not written by those most closely involved in the
        parallelisation.

    Our expectations in parallelising the program were to achieve
    moderate speedups with small changes in the code.  To date
    speedups of up to 2.4 have been achieved for Lolita running under
    a realistic simulation of our 4 processor shared-memory target
    machine. Most notably, the parallelism is achieved with a very
    small number of changes to, and without requiring an understanding
    of most of the application.  On the Sun SPARCserver target machine
    wall-clock speedup is currently limited by physical memory
    availability.
  }
}


@InProceedings{LoidlT97,
  author    = Loidl # and # Trinder,
  title     = {Engineering Large Parallel Functional Programs},
  booktitle = {IFL'97 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {St Andrews, Scotland},
  month     = sep,
  year      = {1997},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1467},
  pages     = {178--197},
  keywords  = {parallel Haskell, GpH, parallel applications},
  doi       = {http://dx.doi.org/10.1007/BFb0055431},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/eng.html},
  abstract  = {
    The design and implementation of useful programming languages,
    whether sequential or parallel, should be driven by large,
    realistic applications.  In constructing several medium- and
    large-scale programs in Glasgow Parallel Haskell, GPH, a parallel
    extension of Haskell, the group at Glasgow has investigated
    several important engineering issues:

      * Real Application Parallelism: The programs achieve good
        wall-clock speedups and acceptable scale-up on both a
        shared-memory and a distributed memory machine. The programs
        typify a number of application areas and use a number of
        different parallel paradigms, e.g. pipelining or
        divide-and-conquer, often combining several paradigms in a
        single program.

      * Language Issues: Although the largely implicit parallelism in
        GPH is a satisfactory programming model in general the base
        constructs for introducing and controlling parallelism tend to
        obfuscate the semantics of large programs.  As a result we
        developed evaluation strategies, a more abstract, and
        systematic mechanism for introducing and controlling
        parallelism.

      * Engineering Environment: The development and performance
        tuning of these programs emphasised the importance of an
        integrated engineering environment. In the process we have
        refined components of this environment like the simulator, the
        runtime system, and the profiling tools.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1996

@InProceedings{LoidlH96a,
  author    = Loidl # and # Hammond,
  title     = {A Sized Time System for a Parallel Functional Language},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = jul,
  year      = {1996},
  keywords  = {parallel Haskell, GpH, models of parallelism, resource analysis},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/sized.html},
  abstract  = {
    This paper describes an inference system, whose purpose is to
    determine the cost of evaluating expressions in a strict purely
    functional language. Upper bounds can be derived for both
    computation cost and the size of data structures. We outline a
    static analysis based on this inference system for inferring size
    and cost information. The analysis is a synthesis of the sized
    types of Hughes et al., and the polymorphic time system of Dornic
    et al., which was extended to static dependent costs by Reistad
    and Gifford.

    Our main interest in cost information is for scheduling tasks in
    the parallel execution of functional languages. Using the GranSim
    parallel simulator, we show that the information provided by our
    analysis is sufficient to characterise relative task granularities
    for a simple functional program. This information can be used in
    the runtime-system of the Glasgow Parallel Haskell compiler to
    improve dynamic program performance.
  }
}


@InProceedings{LoidlH96b,
  author    = Loidl # and # Hammond,
  title     = {Making a Packet: Cost-Effective Communication for a Parallel Graph Reducer},
  booktitle = {IFL'96 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {Bad-Godesberg, Germany},
  month     = sep,
  year      = {1996},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {1268},
  pages     = {184--199},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/packet.html},
  abstract  = {
    This paper studies critical runtime-system issues encountered when
    packing data for transmission in a lazy, parallel graph reduction
    system. In particular, we aim to answer two questions:

      * How much graph should go into a packet?

      * How aggressively should a processor look for work after
        requesting remote data?

    In order to answer the first question, we compare various packing
    schemes, of which one extreme packs just the node that is demanded
    (``incremental fetching''), and the other packs all the graph that
    is reachable from that node (``bulk fetching''). The second
    question is addressed by considering various mechanisms for
    latency hiding during communication, ranging from fully
    synchronous communication with no attempt to mask latency, to full
    thread migration during asynchronous communication.

    In order to make our results as general as possible, we have used
    the GranSim simulator to study a wide variety of parallel machine
    configurations. Based on these measurements we propose concrete
    improvements for parallel graph reducers such as the GUM
    implementation of Glasgow Parallel Haskell.
  }
}


@InProceedings{TrinderHLPW96,
  author    = Trinder # and # Hammond # and # Loidl # and # PeytonJones # and # Wu,
  title     = {Accidents always Come in Threes: A Case Study of Data-intensive Programs in Parallel {Haskell}},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = jul,
  year      = {1996},
  keywords  = {parallel Haskell, GpH, parallel applications},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/data-intensive.html},
  abstract  = {
    Accidents happen:

      * "An invisible car came out of nowhere, struck my vehicle and
        vanished."

      * "I pulled away from the side of the road, glanced at my
        mother-in-law, and headed for the embankment."

      * "As I approached the intersection a sign suddenly appeared in
        a place where no stop sign had ever appeared before."

    Luckily, we don't normally have to deal with problems as bizarre
    as these. One interesting application that does arise at the
    Centre for Transport Studies consists of matching police reports
    of several accidents so as to locate accident blackspots. The
    application provides an interesting, data-intensive, test-bed for
    the persistent functional language PFL. We report here on an
    approach aimed at improving the performance of this application
    using Glasgow Parallel Haskell.

    The accident application is one of several large parallel Haskell
    programs under development at Glasgow. Our objective is to achieve
    wall-clock speedups over the best sequential implementations, and
    we report modest wall-clock speedups for a demonstration
    program. From experience with these and other programs the group
    is developing a methodology for parallelising large functional
    programs. We have also developed strategies, a mechanism to
    separately specify a function's algorithm and its dynamic
    behaviour.
  }
}


@InProceedings{TrinderHMPP96,
  author    = Trinder # and # Hammond # and # Mattson # and # Partridge # and # PeytonJones,
  title     = {{GUM}: A Portable Parallel Implementation of {Haskell}},
  booktitle = {PLDI'96 --- ACM Conf. on Programming Language Design and Implementation},
  address   = {Philadephia, PA, USA},
  month     = may,
  year      = {1996},
  publisher = P-ACM,
  pages     = {79--88},
  doi       = {http://doi.acm.org/10.1145/231379.231392},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/gum.html},
  keywords  = {parallel Haskell, GpH, implementation},
  abstract  = {
    GUM is a portable, parallel implementation of the Haskell
    functional language. Despite sustained research interest in
    parallel functional programming, GUM is one of the first such
    systems to be made publicly available. GUM is message-based, and
    portability is facilitated by using the PVM communications harness
    that is available on many multi-processors. As a result, GUM is
    available for both shared-memory (Sun SPARCserver multiprocessors)
    and distributed-memory (networks of workstations)
    architectures. The high message-latency of distributed machines is
    ameliorated by sending messages asynchronously, and by sending
    large packets of related data in each message. Initial performance
    figures demonstrate absolute speedups relative to the best
    sequential compiler technology. To improve the performance of a
    parallel Haskell program GUM provides tools for monitoring and
    visualising the behaviour of threads and of processors during
    execution.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1995

@InProceedings{HammondLP95,
  author    = Hammond # and # Loidl # and # Partridge,
  title     = {Visualising Granularity in Parallel Programs: A Graphical Winnowing System for {Haskell}},
  booktitle = {HPFC'95 --- Conf. on High Performance Functional Computing},
  address   = {Denver, CO, USA},
  month     = apr,
  year      = {1995},
  pages     = {208--221},
  keywords  = {parallel Haskell, GpH, profiling, GranSim},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/winnowing.html},
  abstract  = {
    To take advantage of distributed-memory parallel machines it is
    essential to have good control of task granularity. This paper
    describes a fairly accurate parallel simulator for Haskell, based
    on the Glasgow compiler, and complementary tools for visualising
    task granularities. Together these tools allow us to study the
    effects of various annotations on task granularity on a variety of
    simulated parallel architectures. They also provide a more precise
    tool for the study of parallel execution than has previously been
    available for Haskell programs.

    These tools have already confirmed that thread migration is
    essential in parallel systems, demonstrated a close correlation
    between thread execution times and total heap allocations, and
    shown that fetching data synchronously normally gives better
    overall performance than asynchronous fetching, if data is fetched
    on demand.
  }
} 


@InProceedings{LoidlH95,
  author    = Loidl # and # Hammond,
  title     = {On the Granularity of Divide-and-Conquer Parallelism},
  booktitle = {Glasgow Workshop on Functional Programming},
  address   = {Ullapool, Scotland},
  month     = jul,
  year      = {1995},
  publisher = P-SV,
  series    = S-WC,
  keywords  = {parallel Haskell, GpH, models of parallelism},

  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/div-conc.html},
  pdf       = {http://www.bcs.org/upload/pdf/ewic_fp95_paper13.pdf},
  abstract  = {
    This paper studies the runtime behaviour of various parallel
    divide-and-conquer algorithms written in a non-strict functional
    language, when three common granularity control mechanisms are
    used: a simple cut-off, a priority thread creation and a priority
    scheduling mechanism. These mechanisms use granularity information
    that is currently provided via annotations to improve the
    performance of the parallel programs.

    The programs we examine are several variants of a generic
    divide-and-conquer program, an unbalanced divide-and-conquer
    algorithm and a parallel determinant computation. Our results
    indicate that for balanced computation trees a simple,
    low-overhead mechanism performs well whereas the more complex
    mechanisms offer further improvements for unbalanced computation
    trees.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1994

%% TODO: del entry
@InProceedings{Hammond94,
  author    = Hammond,
  title     = {Parallel Functional Programming: An Introduction},
  booktitle = {PASCO'94 --- Intl. Symp. on Parallel Symbolic Computation},
  address   = {Hagenberg/Linz, Austria},
  month     = sep,
  year      = {1994},
  publisher = P-WSCI,
  keywords  = {parallel Haskell, misc, survey},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/par-intro.html},
  abstract  = {
    This paper introduces the general area of parallel functional
    programming, surveying the current state of research and
    suggesting areas which could profitably be explored in the
    future. No new results are presented. The paper contains 97
    references selected from the 500 or so publications in this field.
  }
}


@InProceedings{HammondMP94,
  author    = Hammond # and # Mattson # and # PeytonJones,
  title     = {Automatic Spark Strategies and Granularity for a Parallel Functional Language Reducer},

  booktitle = {CONPAR'94 --- Intl. Conf. on Parallel Processing},
  address   = {Linz, Austria},
  month     = sep,
  year      = {1994},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {854},
  pages     = {521--532},
  keywords  = {parallel Haskell, implementation, GRIP},
  doi       = {http://dx.doi.org/10.1007/3-540-58430-7_46},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/spark-strats.html},
  abstract  = {
    This paper considers the issue of dynamic task control in the
    context of a parallel Haskell implementation on the GRIP
    multiprocessor. For the first time, we report the effect of our
    task control strategies on task granularity, as measured in terms
    of dynamic heap allocations. This gives a concrete means of
    measuring the effectiveness of these strategies other than
    wall-clock timings, which are notoriously uninformative.
  }
}


@InProceedings{LoidlH94,
  author    = Loidl # and # Hammond,
  title     = {{GRAPH} for {PVM}: Graph Reduction for Distributed Hardware},
  booktitle = {IFL'94 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {Norwich, UK},
  month     = sep,
  year      = {1994},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/graph-pvm.html},
  abstract  = {
    We describe a version of the GRAPH system (Graph Reduction for an
    Assortment of Parallel Hardware) designed to execute parallel
    functional programs on a range of distributed-memory MIMD machines
    running the PVM communications harness.  GRAPH was developed from
    the runtime system for the novel GRIP multiprocessor.  Although
    this system has proved highly successful, being a novel
    architecture it is hard to compare results with other
    architectures or implementations.  The CPUs used in the GRIP
    design are also rather old.

    The principal extensions from GRAPH for GRIP to GRAPH for PVM are
    intended to handle high latencies more efficiently, by exploiting
    asynchronous communication, multi-threaded scheduling, and by
    grouping packets into larger entities where possible.  The main
    innovation is the development of new, sophisticated packet
    flushing and packet fetching algorithms whose purpose is to allow
    graphs to be exported to and fetched from global memory without
    inter-processor synchronisation.  As communication and reduction
    can be interleaved, a new form of synchronising communication and
    reduction is necessary.

    We also give a short outlook on the design of a new portable
    distributed graph reduction system, GRAPH for UMM, that adepts the
    ideas and techniques originally developed for the GRIP system to a
    distributed memory environment.

    GRAPH for PVM has been implemented on a network of SUN
    workstations and it is currently being tested and debugged.
    Although no extensive performance measurements have been performed
    yet, the available data shows a significant decrease in the
    overall communication and thus an improvement in the overall
    performance of the system.
  }
}


@InProceedings{TrinderHLMPP94,
  author    = Hammond # and # Loidl # and # Mattson # and # Partridge # and # PeytonJones # and # Trinder,
  title     = {{GRAPHing} the Future},
  booktitle = {IFL'94 --- Intl. Workshop on the Implementation of Functional Languages},
  address   = {Norwich, UK},
  month     = sep,
  year      = {1994},
  keywords  = {parallel Haskell, GpH, implementation},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/graphing.html},
  abstract  = {
    At Glasgow our research into parallel functional programming has
    been moving away from our novel architecture, GRIP towards the
    provision of a general parallel runtime environment.  We call this
    GRAPH (Graph Reduction for an Assortment of Parallel Hardware).

    This paper describes the design of a new memory and load
    management model for GRAPH, which is intended to match shared- and
    distributed-memory machines better and to provide a framework for
    our research into parallel functional databases and granularity
    control.  This model is currently under implementation.  No
    performance results can therefore be provided at the present time.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1993

@InProceedings{AkerholtHPT93,
  author    = Akerholt # and # Hammond # and # PeytonJones # and # Trinder,
  title     = {Processing Transactions on {GRIP}, a Parallel Graph Reducer},
  booktitle = {PARLE'93 --- Intl. Conf. on Parallel Architectures and Languages Europe},
  address   = {Munich, Germany},
  month     = jun,
  year      = {1993},
  publisher = P-SV,
  series    = S-LNCS,
  volume    = {694},
  pages     = {634--647},
  keywords  = {parallel Haskell, implementation, GRIP},
  doi       = {http://dx.doi.org/10.1007/3-540-56891-3_51},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/trans-grip.html},
  abstract  = {
    The GRIP architecture allows efficient execution of functional
    programs on a multi-processor built from standard hardware
    components. State-of-the-art compilation techniques are combined
    with sophisticated runtime resource-control to give good parallel
    performance. This paper reports the results of running GRIP on an
    application which is apparently unsuited to the basic functional
    model: a database transaction manager incorporating updates as
    well as lookup transactions. The results obtained show good
    relative speedups for GRIP, with real performance advantages over
    the same application executing on sequential machines.
  }
}


@InProceedings{Hammond93,
  author    = Hammond,
  title     = {Getting a {GRIP}},
  booktitle = {Intl. Workshop on the Parallel Implementation of Functional Languages},
  address   = {Nijmegen, The Netherlands},
  month     = sep,
  year      = {1993},
  keywords  = {parallel Haskell, implementation, GRIP},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/grip.html},
  abstract  = {
    This paper describes a portable implementation of the Graph
    Reduction in Parallel (GRIP) software, built on the Parallel
    Virtual Machine (PVM) process control system. Using this system,
    it is easy for anyone to ``get a GRIP'' without needing to invest
    in special-purpose hardware. An important contribution of this
    paper is its detailed description of GRIP's Intelligent Memory
    Unit (IMU) software.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1992

@InProceedings{HammondP92,
  author    = Hammond # and # PeytonJones,
  title     = {Profiling Scheduling Strategies on the {GRIP} Multiprocessor},
  booktitle = {Intl. Workshop on the Parallel Implementation of Functional Languages},
  address   = {Aachen, Germany},
  month     = sep,
  year      = {1992},
  pages     = {73--98},
  keywords  = {parallel Haskell, implementation, GRIP},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/grip-sched.html},
  abstract  = {
    It is widely claimed that functional languages are particularly
    suitable for programming parallel computers. A claimed advantage
    is that the programmer is not burdened with details of task
    creation, placement, scheduling, and synchronisation, these
    decisions being taken by the system instead. Leaving aside the
    question of whether a pure functional language is expressive
    enough to encompass all the parallel algorithms we might wish to
    program, there remains the question of how effectively the
    compiler and run-time system map the program onto a real parallel
    system, a task usually carried out mostly by the programmer. This
    is the question we address in this paper.

    We first introduce the system architecture of GRIP, a
    shared-memory parallel machine supporting an implementation of the
    functional language Haskell. GRIP executes functional programs in
    parallel using compiled supercombinator graph reduction, a form of
    declarative rule system.

    We then describe several strategies for run-time resource control
    which we have tried, presenting comprehensive measurements of
    their effectiveness. We are particularly concerned with strategies
    controlling task creation, in order to improve task granularity
    and minimise communication overheads. This is, so far as we know,
    one of the first attempts to make a systematic study of
    task-control strategies in a high-performance parallel
    functional-language system. GRIP's high absolute performance
    render these results credible for real applications.
  }
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1990

@InProceedings{HammondP90,
  author    = Hammond # and # PeytonJones,
  title     = {Some Early Experiments on the {GRIP} Parallel Reducer},
  booktitle = {Intl. Workshop on the Parallel Implementation of Functional Languages},
  address   = {Nijmegen, The Netherlands},
  month     = jun,
  year      = {1990},
  pages     = {51--72},
  keywords  = {parallel Haskell, implementation, GRIP},
  url       = {http://www.macs.hw.ac.uk/~dsg/gph/papers/abstracts/early-grip.html},
  abstract  = {
    GRIP is a multiprocessor designed to execute functional programs
    in parallel using graph reduction. We have implemented a compiler
    for GRIP, based on the Spineless Tagless G-machine, and can now
    run parallel functional programs with substantial absolute speedup
    over the same program running on a uniprocessor Sun.

    Parallel functional programming shifts some of the burden of
    resource allocation from the programmer to the system. Examples of
    such decisions include: when to create a new concurrent activity
    (or thread), when to execute such threads, where to execute them,
    and so on.

    It is clearly desirable that the system should take such
    decisions, provided it does a good enough job. For example, a
    paged virtual memory system almost always does an adequate job,
    and a programmer very seldom has to interfere with it. The big
    question for parallel functional programming is whether good
    resource-allocation strategies exist, and how well they perform
    under a variety of conditions.

    Now that we have an operational system, we are starting to carry
    out experiments to develop resource-allocation strategies, and
    measure their effectiveness. This paper reports on some very
    preliminary results. They mainly concern the question of when, or
    even whether, to create a new thread. This is an aspect which has
    so far received little attention --- existing work has focused
    mainly on load sharing rather than on thread creation. Our work is
    aided significantly by the fact that sparks indicate only the
    possibility of creating a new parallel thread, not the necessity
    of creating this thread. It is therefore always safe to discard
    sparks, while affecting only the performance of the computation,
    not its result.

    We begin with an overview of the GRIP system, and how we perform
    compiled graph reduction. This is followed by a summary of the
    statistics-gathering system and the results we have obtained so
    far.
  }
}


%% END OF GpH BIBLIOGRAPHY
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%