[SEE THE STRATEGY PAPER: "STRATEGIES FOR WRITING PARALLEL NON-STRICT PROGRAMS", TRINDER P.W., LOIDL H.W., HAMMOND K., PEYTON JONES S.L. (IN PREPARATION). ]
This chapter deals with parallel, lazy functional programming in the large. It introduces the notion of strategies and shows how they make it easier to develop large parallel programs.
The following naive version parmap
often does not give good parallelism:
parmap :: (a -> b) -> [a] -> [b] parmap f [] = [] parmap f (x:xs) = fx `par` pmxs `par` (fx:pmxs) where fx = f x pmxs = parmap f xs
If in this version the spark for pmxs is discarded almost all parallelism in the program is lost. On the other hand, if all par's are executed more parallel threads than necessary are created (two for each list element).
An alternative version of parmap reduces the total number of generated threads by replacing the second par with a seq:
parmap :: (a -> b) -> [a] -> [b] parmap f [] = [] parmap f (x:xs) = fx `par` pmxs `seq` (fx:pmxs) where fx = f x pmxs = parmap f xs
The problem of this version is that in a context that requires the result of parmap only in weak head normal form, the recursive call to parmap will be deferred until its value is needed. This drastically reduces the total amount of parallelism in the program. Especially, if such a function is used as the producer in a program with producer-consumer parallelism, this gives very poor parallelism (see section Example Program: Sum of Squares).
To improve this behaviour one can write a forcing function, which guarantees that all of the result is demanded immediately:
forcelist [] = () forcelist (x:xs) = x `seq` (forcelist xs)
Using this function in parmap
gives the following function:
parmap :: (a -> b) -> [a] -> [b] parmap f [] = [] parmap f (x:xs) = fx `par` (forcelist pmxs) `seq` (fx:pmxs) where fx = f x pmxs = parmap f xs
However, following this approach yields a big set of forcing functions and a mixture of defining the result and driving the computation. We want to avoid both.
[FOR NOW MAINLY THE ABSTRACT OF THE STRATEGY PAPER]
Separate the components of a parallel, lazy functional program:
In most current parallel non-strict functional languages, a function must both describe the value to be computed, and the dynamic behaviour. The dynamic behaviour of a function has two aspects: parallelism, i.e. what values could be computed in parallel, and evaluation-degree, i.e. how much of each value should be constructed. Our experience is that specifying the dynamic behaviour of a function obscures its semantics. Strategies have been developed to address this problem; a strategy being an explicit description of a function's potential dynamic behaviour. The philosophy is that it should be possible to understand the semantics of a function without considering it's dynamic behaviour.
In essence a strategy is a runtime function that traverses a data structure specifying how much of it should be evaluated, and possibly sparking threads to perform the construction in parallel. Because strategies are functions they can be defined on any type and can be combined to specify sophisticated dynamic behaviour. For example a strategy can control thread granularity or specify evaluation over an infinite structure. A disadvantage is that a strategy requires an additional runtime pass over parts of the data structure. Example programs are given that use strategies on divide-and-conquer, pipeline and data-parallel applications.
This section compares two version of parallel quicksort: one using a forcing function and another one using strategies.
The following naive attempt to introduce parallelism in quicksort fails because the threads generating loSort and hiSort only create a single cons cell.
quicksortN [] = [] quicksortN [x] = [x] quicksortN (x:xs) = losort `par` hisort `par` result where losort = quicksortN [y|y <- xs, y < x] hisort = quicksortN [y|y <- xs, y >= x] result = losort ++ (x:hisort)
The current practice of parallel Haskell programmers is to introduce a forcing function that forces the evaluation of the required elements of a data structure. For example
forceList :: [a] -> () forceList [] = () forceList (x:xs) = x `seq` forceList xs
Quicksort can be rewritten to have the desired behaviour using forceList as follows.
quicksortF [] = [] quicksortF [x] = [x] quicksortF (x:xs) = (forceList losort) `par` (forceList hisort) `par` losort ++ (x:hisort) where losort = quicksortF [y|y <- xs, y < x] hisort = quicksortF [y|y <- xs, y >= x]
To obtain the required dynamic behaviour for the parMap example we might use forceList within the definition of f:
f x = forceList result `seq` result where result = [fib x]
When programs are written in this style, a number of forcing functions are required: at least one for each type, e.g. forcePair. To obtain good dynamic behaviour, par, seq and forcing functions are inserted throughout the program. In consequence the dynamic behaviour and static semantics of the computation are intertwined. This intertwining obscures the semantics of the program. Small-scale programs remain manageable, but in larger programs, particularly those with complex data structures and parallel behaviour, discerning the meaning of a program becomes very hard.
In the strategy version we can specify the evaluation degree by applying one of the predefined strategies to the components of the result. These strategies are then combined by using either par or seq to specify the parallelism.
In quicksort, each parallel thread should construct all of its result list, and rnf expresses this neatly. The interesting equation becomes
quicksortS (x:xs) = losort ++ (x:hisort) `using` strategy where losort = quicksortS [y|y <- xs, y < x] hisort = quicksortS [y|y <- xs, y >= x] strategy result = rnf losort `par` rnf hisort `par` rnf result `par` ()
Although the sum-of-squares program is a very simple producer-consumer program, it already shows some basic problems in this model. Unfortunately, the simple version of the program, which was presented in section Semi-explicit Parallelism, produces hardly any parallelism at all. The reason for this behaviour is that because of the lazy evaluation approach the producer will only create the top cons cell of the intermediate list. When the consumer then demands the rest of the list it computes squares as well as the result sum. In the example below version 1 (with res_naive) shows this behaviour.
To improve the parallelism one can use a forcing function on the intermediate list. The parallel thread is then the application of this forcing function on the squares list. As expected, this creates two threads that are computing most of the time and occasionally have to exchange data. Version 2 (with res_force) shows this behaviour.
A better way to achieve the same operational behaviour is to define a strategy, how to compute the result. The strategy describes the parallelism and the evaluation degree. In doing so, it uses the reduce to normal form (rnf) strategy, which is defined in the class NFData. Compared to version 2 the strategy version 3 (with res_strategy) achieves a clean separation of defining the result and specifying operational details like parallelism and evaluation degree.
module Main where import StrategiesVII main = getArgs abort ( \ a -> let args :: [Int] args = map (\ a1 -> fst ((readDec a1) !! 0)) a version = args !! 0 n = args !! 1 -- l = [1..] squares :: [Int] squares = [ i^2 | i <- [1..n] ] -- Naive version sparks a producer for list squares but doesn't force it res_naive = squares `par` sum squares -- This version sparks a forcing function on the list (producer) res_force = (foldl1 seq squares) `par` sum squares -- The strategy version res_strat = sum squares `using` (rnf squares `par` rnf) res = case version of 1 -> res_naive 2 -> res_force 3 -> res_strat _ -> res_naive str = case version of 1 -> "Naive version" 2 -> "Forcing version" 3 -> "Strategy version" _ -> "Naive version" in print ("Sum of squares (" ++ str ++ ") of length " ++ (show n) ++ " = " ++ (show res) ++ "\n") )
When running the naive version of this program the spark for the consumer process is pruned and all the computation of the consumer is subsumed by the producer. This gives a totally sequential program. The relevant parts of the GranSim profile are:
Granularity Simulation for sq 1 99 +RTS -bP -bp: -bs ++++++++++++++++++++ PE 0 [0]: START 0 0x100620 [SN 0] PE 0 [67445]: SPARK 20004 0x347dbc [sparks 0] PE 0 [68241]: PRUNED 20004 0x347dbc [sparks 1] PE 0 [327088]: END 0, SN 0, ST 0, EXP F, BB 847, HA 5426, RT 327088, BT 0 (0), FT 0 (0), LS 0, GS 1, MY T
The other two versions of the program create one producer and one consumer thread as expected.
The per thread activity profile shows the behaviour of the strategy version of the program. Thread 1 is the main thread which computes sum squares and therefore is the consumer process. It creates thread 2 as the producer process, which evaluates squares. The producer is evaluating the list to normal form and therefore is one continuous thread. The consumer has to wait for the results from the producer from time to time and is blocked during these periods.
@centerline{@psfig{angle=90,file=sq3-ap.ps,width=@hsize}}
The per thread activity profile shows the behaviour of the strategy version of the program. Thread 1 is the main thread which computes @t{sum squares} and therefore is the consumer process. It creates thread 2 as the producer process, which evaluates @t{squares}. The producer is evaluating the list to normal form and therefore is one continuous thread. The consumer has to wait for the results from the producer from time to time and is blocked during these periods.
[FOR NOW THIS SECTION IS JUST A SHORT DISCUSSION OF THE STRATEGIES MODULE]
This section discusses the contents of the strategies module. It shows how strategies are implemented on top of the basic par and seq annotations. Haskell's overloading mechanism is used to define a strategy for reducing to normal form. The current implementation is based on Haskell 1.2 but in the near future we want to use constructor classes as provided in Haskell 1.3.
A strategy does not return a meaningful result it only specifies parallelism and evaluation degree. Therefore, we use the following type for strategies
type Strategy a = a -> ()
We can now define a strategy application, which adds information about the operational behaviour of the execution to an expression
using :: a -> Strategy a -> a using x s = s x `seq` x
Note that x `using` s
is a projection on x
, i.e. both
x `using` s [ x
(x `using` s) `using` s = x `using` s
The primitive strategies are basically just syntactic sugar to fit par and seq into the strategy scheme.
sPar
is a strategy corresponding to par
, i.e.
x `par` e <=> e `using` sPar x
sPar :: a -> Strategy b sPar x y = x `par` ()
sSeq
is a strategy corresponding to seq
, i.e.
x `seq` e <=> e `using` sSeq x
sSeq :: a -> Strategy b sSeq x y = x `seq` ()
Three basic strategies describe the evaluation degree of a data structure:
r0 :: Strategy a r0 x = ()
rwhnf :: Strategy a rwhnf x = x `seq` ()
class NFData a where -- rnf reduces it's argument to (head) normal form rnf :: Strategy a -- Default method. Useful for base types. A specific method is necessary for -- constructed types rnf = rwhnf
For primitive types like Ints and Chars the default methods can be used
instance NFData Int instance NFData Char ...
When defining a strategy on a compound data type we can use these three basic strategies and compose them with the strategy combinators below. These combinators describe the parallelism in the evaluation.
-- Pairs instance (NFData a, NFData b) => NFData (a,b) where rnf (x,y) = rnf x `seq` rnf y seqPair :: Strategy a -> Strategy b -> Strategy (a,b) seqPair strata stratb (x,y) = strata x `seq` stratb y parPair :: Strategy a -> Strategy b -> Strategy (a,b) parPair strata stratb (x,y) = strata x `par` stratb y -- Lists instance NFData a => NFData [a] where rnf [] = () rnf (x:xs) = rnf x `seq` rnf xs -- Applies a strategy to every element of a list in parallel parList :: Strategy a -> Strategy [a] parList strat [] = () parList strat (x:xs) = strat x `par` (parList strat xs) -- Applies a strategy to the first n elements of a list in parallel parListN :: (Integral b) => b -> Strategy a -> Strategy [a] parListN n strat [] = () parListN 0 strat xs = () parListN n strat (x:xs) = strat x `par` (parListN (n-1) strat xs) -- Sequentially applies a strategy to each element of a list seqList :: Strategy a -> Strategy [a] seqList strat [] = () seqList strat (x:xs) = strat x `seq` (seqList strat xs) -- Sequentially applies a strategy to the first n elements of a list seqListN :: (Integral a) => a -> Strategy b -> Strategy [b] seqListN n strat [] = () seqListN 0 strat xs = () seqListN n strat (x:xs) = strat x `seq` (seqListN (n-1) strat xs)
Now we can use these predefined strategies to define a parallel map function
-- parMap applies a function to each element of the argument list in -- parallel. The result of the function is evaluated using `strat' parMap :: Strategy b -> (a -> b) -> [a] -> [b] parMap strat f xs = map f xs `using` parList strat
For bigger programs the programmer only has to define strategies on the most important data structures in order to introduce parallelism in to his program. With this approach the parallel program will not be cluttered with par annotations and it's easier to determine semantics and dynamic behaviour of the individual functions.
[STRATEGIES ARE COOL]
Go to the first, previous, next, last section, table of contents.