The essence of the problem facing the parallel programmer is that, in addition to specifying what value the program should compute, explicitly parallel programs must also specify how the machine should organise the computation. There are many aspects to the parallel execution of a program: threads are created, execute on a processor, transfer data to and from remote processors, and synchronise with other threads, etc. Managing all of these aspects on top of constructing a correct and efficient algorithm is what makes explicit parallel programming so hard. The diametrically opposing approach is to rely solely on the compiler and runtime system to manage the parallel execution without any programmer input. Unfortunately, this purely implicit approach is not yet fruitful for the large-scale functional programs we are interested in.
The approach used in GPH is intermediate between purely implicit and purely explicit approaches. The runtime system manages most of the parallel execution, only requiring the programmer to indicate those values that might usefully be evaluated by parallel threads and, since our basic execution model is a lazy one, perhaps also the extent to which those values should be evaluated. We term these programmer-specified aspects the program's dynamic behaviour.
Parallelism is introduced in GPH by the par combinator, which takes two arguments that are to be evaluated in parallel. The expression p `par` e (here we use Haskell's infix operator notation) has the same value as e, and is not strict in its first argument, i.e. bottom `par` e has the value of e. (bottom denotes a non-terminating or failing computation.) Its dynamic behaviour is to indicate that p could be evaluated by a new parallel thread, with the parent thread continuing evaluation of e. We say that p has been sparked, and a thread may subsequently be created to evaluate it if a processor becomes idle. Since the thread is not necessarily created, p is similar to a lazy future [MKH91].
Since control of sequencing can be important in a parallel language [Roe91] we introduce a sequential composition operator, seq. If e1 is not bottom, the expression e1 `seq` e2 also has the value of e2; otherwise it is bottom. The corresponding dynamic behaviour is to evaluate e1 to weak head normal form (WHNF) before returning e2. Informally, this means that every data structure is only evaluated up to the top level constructor.
This section gives an abridged introduction to our parallel programming technique called evaluation strategies [THLP98]. We focus on the language features necessary to achieve the basic functionality and highlight the advantages of this parallel programming technique. A complete description and discussion of evaluation strategies can be found in our Strategies paper. But before we talk about evaluation strategies let's look at a simple parallel program in GPH.
This section gives an example of how to write a simple parallel program in the parallel, lazy functional execution model. Due to a sad lack of imagination we take our good old friend, nfib, as an example program. This is the sequential function from which we start:
nfib :: Int -> Int nfib 0 = 1 nfib 1 = 1 nfib n = nf1+nf2+1 where nf1 = nfib (n-1) nf2 = nfib (n-2)
The straightforward idea for parallelising nfib is to start parallel processes for computing both recursive calls. This is expressed with the par annotation, which `sparks' a parallel process for computing its first argument and whose result is its second argument (the exact operational semantics of par is discussed in more detail in the section called Running a GPH program). This gives the following parallel program:
parfib :: Int -> Int parfib 0 = 1 parfib 1 = 1 parfib n = nf2 `par` (nf1 `par` (nf1+nf2+1)) where nf1 = parfib (n-1) nf2 = parfib (n-2)
The drawback of this program is the blocking of the main task on the two created child-tasks. Only when both child tasks have returned a result, the main task can continue. It is more efficient to have one of the recursive calls computed by the main task and to spark only one parallel process for the other recursive call. In order to guarantee that the main expression is evaluated in the right order (i.e. without blocking the main task on the child task) the seq annotation is used:
parfib :: Int -> Int parfib 0 = 1 parfib 1 = 1 parfib n = nf2 `par` (nf1 `seq` (nf1+nf2+1)) where nf1 = parfib (n-1) nf2 = parfib (n-2)
The operational reading of this function is as follows: First spark a parallel process for computing the expression parfib (n-2). Then evaluate the expression parfib (n-1). Finally, evaluate the main expression. If the parallel process has not finished by this time the main task will be blocked on nf2. Otherwise it will just pick up the result.
This simple example already shows several basic aspects of parallel, lazy functional programming:
Another aspect not shown in this example is the importance of evaluation degree. If the parallel expressions create a compound type (e.g. a list) then it is important that they are evaluated to a certain degree. Otherwise there won't be a lot of parallelism in the program. We will revisit this aspect again in .
Finally, here is the total example as a stand-alone program can be found in the section called in Appendix Appendix A.
The following naive attempt to introduce parallelism in quicksort fails because the threads generating loSort and hiSort only create a single cons cell.
quicksortN  =  quicksortN [x] = [x] quicksortN (x:xs) = losort `par` hisort `par` result where losort = quicksortN [y|y <- xs, y < x] hisort = quicksortN [y|y <- xs, y >= x] result = losort ++ (x:hisort)
A common practice of parallel Haskell programmers is to introduce a forcing function that forces the evaluation of the required elements of a data structure. For example
forceList :: [a] -> () forceList  = () forceList (x:xs) = x `seq` forceList xs
Quicksort can be rewritten to have the desired behaviour using forceList as follows.
quicksortF  =  quicksortF [x] = [x] quicksortF (x:xs) = (forceList losort) `par` (forceList hisort) `par` losort ++ (x:hisort) where losort = quicksortF [y|y <- xs, y < x] hisort = quicksortF [y|y <- xs, y >= x]
To obtain the required dynamic behaviour for the parMap example we might use forceList within the definition of f:
f x = forceList result `seq` result where result = [fib x]
When programs are written in this style, a number of forcing functions are required: at least one for each type, e.g. forcePair. To obtain good dynamic behaviour, par, seq and forcing functions are inserted throughout the program. In consequence the dynamic behaviour and static semantics of the computation are intertwined. This intertwining obscures the semantics of the program. Small-scale programs remain manageable, but in larger programs, particularly those with complex data structures and parallel behaviour, discerning the meaning of a program becomes very hard.
Even with the simple parallel programming model provided by par and seq we find that more and more code is inserted in order to obtain better parallel performance. In realistic programs the algorithm can become entirely obscured by the dynamic-behaviour code.
Evaluation strategies use lazy higher-order functions to separate the two concerns of specifying the algorithm and specifying the program's dynamic behaviour. A function definition is split into two parts, the algorithm and the evaluation strategy, with values defined in the former being manipulated in the latter. The algorithmic code is consequently uncluttered by details relating only to the dynamic behaviour. In fact the driving philosophy behind evaluation strategies is that it should be possible to understand the semantics of a function without considering its dynamic behaviour.
A strategy [THLP98] is a function that specifies the dynamic behaviour required when computing a value of a given type. A strategy makes no contribution towards the value being computed by the algorithmic component of the function: it is evaluated purely for effect, and hence it returns just the empty tuple ().
type Strategy a = a -> ()
The simplest strategies introduce no parallelism: they specify only the evaluation degree. The simplest strategy is termed r0 and performs no reduction at all. Perhaps surprisingly, this strategy proves very useful, e.g. when evaluating a pair we may want to evaluate only the first element but not the second.
r0 :: Strategy a r0 _ = ()
Because reduction to WHNF is the default evaluation degree in GPH, a strategy to reduce a value of any type to WHNF is easily defined:
rwhnf :: Strategy a rwhnf x = x `seq` ()
Many expressions can also be reduced to normal form (NF), i.e. a form that contains no redexes, by the rnf strategy. The rnf strategy can be defined over both built-in and user-defined types, but not over function types or any type incorporating a function type - few reduction engines support the reduction of inner redexes within functions. Rather than defining a new rnfX strategy for each data type X, it is better to have a single overloaded rnf strategy that works on any data type. The obvious solution is to use a Haskell type class, NFData, to overload the rnf operation. Because NF and WHNF coincide for built-in types such as integers and booleans, the default method for rnf is rwhnf.
class NFData a where rnf :: Strategy a rnf = rwhnf
For each data type an instance of NFData must be declared that specifies how to reduce a value of that type to normal form. Such an instance relies on its element types, if any, being in class NFData. Consider lists and pairs for example.
instance NFData a => NFData [a] where rnf  = () rnf (x:xs) = rnf x `seq` rnf xs instance (NFData a, NFData b) => NFData (a,b) where rnf (x,y) = rnf x `seq` rnf y
A strategy can specify parallelism and sequencing as well as evaluation degree. Strategies specifying data-oriented parallelism describe the dynamic behaviour in terms of some data structure. For example parList is similar to seqList, except that it applies the strategy to every element of a list in parallel.
parList :: Strategy a -> Strategy [a] parList strat  = () parList strat (x:xs) = strat x `par` (parList strat xs)
Data-oriented strategies are applied by the using function which applies the strategy to the data structure x before returning it.
using :: a -> Strategy a -> a using x s = s x `seq` x
A parallel map is an example of data-oriented parallelism, and is used in several of the programs. The parMap function defined below applies its function argument to every element of a list in parallel. Note how the algorithmic code map f xs is cleanly separated from the strategy. The strat parameter determines the dynamic behaviour of each element of the result list, and hence parMap is parametric in some of its dynamic behaviour.
parMap :: Strategy b -> (a -> b) -> [a] -> [b] parMap strat f xs = map f xs `using` parList strat
As an alternative to such a using-based design of parallel code we have also introduced a new construct, $||, called strategic function application. As an extension to the standard function application, $, in Haskell, the construct f $|| s $ x applies the strategy s to the argument x in parallel with applying the function f to x. This construct is especially useful for defining data-oriented parallelism over complex data-structures. This is due to the typical design of functional programs as compositions of small, flexible sub-functions [Hugh89]. Compared to the above parMap function this new construct makes it possible to define data-oriented parallelism without changing the definition of map itself. For example the expression g $ parMap rnf f xs can also be written as
g $|| parList rnf $ map f xs
In the latter expression the strategy is separated from the algorithmic code and the sequential sub-functions are unchanged, thus describing parallelism on a higher level in the program. Variants of this idea are sequential strategic function application, $|, which adds a synchronisation barrier and thus is useful for defining pipelines, and strategic function composition in a parallel, .||, and a sequential version, .|, respectively.
Returning to our quicksort example of the section called Parallel Quicksort using Forcing Functions, we can now rewrite the parallel program to use evaluation strategies. In the strategy version we can specify the evaluation degree by applying one of the predefined strategies to the components of the result. These strategies are then combined by using either par or seq to specify the parallelism.
In quicksort, each parallel thread should construct all of its result list, and rnf expresses this neatly. The interesting equation becomes
quicksortS (x:xs) = losort ++ (x:hisort) `using` strategy where losort = quicksortS [y|y <- xs, y < x] hisort = quicksortS [y|y <- xs, y >= x] strategy result = rnf losort `par` rnf hisort `par` rnf result `par` ()
The prime motivation in the design of evaluation strategies has been the separation of algorithmic and behavioural code, as shown in the strategic version of quicksort in the section called Parallel Quicksort using Strategies. A comparison of pre-strategy with strategic code shows that such a separation aids the performance tuning process of parallel programs and enables the programmer to experiment with several parallel versions of the code.
Because evaluation strategies are written using the same language as the algorithm, they have additional desirable properties. Strategies are powerful: simpler strategies can be composed, or passed as arguments to form more elaborate strategies. Strategies are extensible: indeed in the parallelisation of several programs discussed in we have defined new application-specific strategies. Strategies can be defined over all types in the language, and offer some level of type safety because the normal type system applies to strategic code. Strategies have a clear semantics, which is precisely that used by the algorithmic language.