Algorithm + Strategy = Parallelism -- Purely Implicit Approaches

Go up to Related Work
Go forward to Algorithmic Skeletons

Purely Implicit Approaches

Purely implicit approaches include dataflow languages like Id [Arvind et al., 1989] or pH [Nikhil et al., 1993][Flanagan and Nikhil, 1996], and evaluation transformers [Burn, 1987]. Data parallel languages such as NESL [Blelloch et al., 1993] can also be seen as implicitly parallelising certain bulk data structures. All of the implicit approaches have some fixed underlying model of parallelism. Because evaluation strategies allow explicit control of some crucial aspects of parallelism, the programmer can describe behaviours very different from the fixed model, e.g. speculatively evaluating some expressions.

Evaluation Transformers

Evaluation transformers exploit the results of strictness analysis on structured data types, providing parallelism control mechanisms that are tailored to individual strictness properties [Burn, 1987]. Each evaluation transformer reduces its argument to the extent that is allowed by the available strictness information. The appropriate transformer is selected at compile time, giving efficient execution at the cost of some increase in code-size [Burn, 1991][Finne and Burn, 1993].

If there are only a small number of possible transformers (as for lists using the standard 4-point strictness domain - see Table *), repeated work can be avoided by recording the extent to which a data structure has already been evaluated, and then using a specialised transformer on the unevaluated, but needed part of that structure.

Transf. Meaning Strategy

E₀ No reduction r0

E_WHNF Reduce to WHNF rwhnf

E_TS Reduce spine of a list seqList r0

E_HTS Reduce each list element to WHNF seqList rwhnf

The Relationship of Evaluation Strategies and Transformers

One problem with evaluation transformers is that the more sophisticated the strictness analysis, and the more types they are defined on, the greater is the number of evaluation transformers that are needed, and the greater is the code-bloat. Specialised transformers must be defined in the compiler for each type, complicating the provision of transformers over programmer-defined types.

In contrast, since the programmer has control over which strategy is to be used in a particular context, and since those strategies are programmable rather than fixed, strategies are strictly more general than evaluation transformers. In particular, a programmer can elect to use a strategy that is more strict than the function in order to obtain good performance or to allow speculation; to use a strategy that is known to be safe, though stricter than the analyser can detect; or to use a strategy that is less strict than the analyser can determine, in order to improve granularity. Finally, it is not always straightforward to determine how much strictness an analyser might detect, and small program changes may have dramatic effects on strictness information.

It is possible that in the future, strictness analysis could drive the choice of an appropriate evaluation strategy in at least some circumstances. Indeed we are aware of a relationship between strictness domains and some strategies. Use of strictness information in this way would make strategies more implicit than they are at present.

Data Parallelism

It has been argued that support should be provided for both task and data parallelism [Subhlok et al., 1993]. We have already shown how some kinds of data-oriented parallelism can be expressed using evaluation strategies. Truly data parallel approaches, however, such as NESL [Blelloch et al., 1993][Blelloch, 1996] treat higher-order functions such as scans and folds, or compound expressions such as list- and array-comprehensions, as single "atomic" operations over entire structures such as lists or arrays.

In effect, functions are applied to each element of the data simultaneously, rather than data being supplied to the functions. This approach is more suitable than control parallelism for massively parallel machines, such as the CM-2. Certain evaluation strategies can therefore be seen as control parallel implementations of data parallel constructs, targeted more at distributed-memory or shared-memory machines than at massively parallel architectures.

Unusually for a data parallel language, NESL supports nested parallelism. This allows more complex and more irregular computation patterns to be expressed than with traditional data-parallel languages such as C* [Rose, et al., 1987]. For example, to compute the product of a sequence using a divide-and-conquer style algorithm, the following code could be used:

function product(a) =
  if (#a == 1) then a[0]
  else let r = {product(v) : v in bottop(a)};
       in r[0] * r[1]

The RHS of r is a sequence comprising two calls to product. The bottop function is used to split the argument sequence, a, into two equal sized components.

This could then be used in an outer sequence if required, for example,

function products(m,n) =
   {product(s) : s in {[p:n] : p in [m:n-1]}}

NESL has been implemented on a variety of machines including the CM-2, the Cray Y-MP and the Encore Multimax.

Dataflow

Many recent dataflow languages are functional, e.g. Id [Arvind et al., 1989]; one of the most recent, pH [Nikhil et al., 1993], is in fact a variant of Haskell. These languages usually introduce parallelism implicitly, for example by using an evaluation scheme such as lenient evaluation [Traub, 1991] which generates massive amounts of fine-grained parallelism. Unfortunately, these threads are often too small to be utilised efficiently by conventional thread technology. The solutions are to use hardware support for parallelism as with Monsoon [Papadopoulos, 1990] or *T [Nikhil et al., 1992], or to use compiler optimisations to create larger threads statically [Traub et al., 1992]. In contrast, used with suitable performance analyses or measurement tools, evaluation strategies provide a readily available handle that can help to control thread size.

Sisal [McGraw, 1985] provides high-level loop-based constructs in a first-order dataflow language. These constructs support implicit control parallelism over arrays. The Sisal 90 language [Feo et al., 1995] adds higher-order functions, polymorphism and user-defined reductions.

{trinder,hwloidl,simonpj}@dcs.gla.ac.uk, kh@dcs.st-and.ac.uk

Transf.	Meaning	Strategy
E₀	No reduction	`r0`
E_WHNF	Reduce to WHNF	`rwhnf`
E_TS	Reduce spine of a list	`seqList r0`
E_HTS	Reduce each list element to WHNF	`seqList rwhnf`