Parallel strategies can be used to split Haskell lists into chunks for parallel evaluation, and C++ arrays can similarly be evaluated in parallel using OpenMP directives.
The following code splits a Haskell list into chunks of length i
,
and processes the list using function f
by assigning chunks across
multiple CPU cores. Allocation to threads at each CPU core is decided
dynamically at runtime, threads steal work from a work queue (a
sparkpool using GHC terminology) to balance load:
= withStrategy (parListChunk i strat) . map f
parMapChunked strat i f = parMapChunked rseq 1000 f [0..199999] newList
The same parallel evaluation of a 1 dimensional C++ array can be achieved using OpenMP directives. As with Haskell, threads steal work to balance load:
#pragma omp for schedule(dynamic, 1000)
for(int i=0 ; i < 200000; i++) {
[i] = f(i);
newArray}
The computations at each list element in Haskell are iteration
independent by construction, thanks to the immutability of map
over
the list. Let's create a parallel strategy on lists…
parMapChunked :: Strategy b -- ^ evaluation degree at each element
-> Int -- ^ chunk size
-> (a -> b) -- ^ function to apply to each element
-> [a] -- ^ input list
-> [b]
= withStrategy (parListChunk i strat) . map f parMapChunked strat i f
This parallel strategy for list evaluation can then be used like this:
let newList = parMapChunked rseq 1000 addFive [0..199999]
Which splits the [0..199999]
list into 200 lists using the chunk
size 1000. The strategy rseq
evaluates each application of addFive
to weak head normal form. After that, we sum all elements in newList
and print the results:
let sum = foldl' (+) 0 newList
print sum
We can do something similar in C++ using OpenMP directives.
#pragma omp for schedule(dynamic, 1000)
for(int i=0 ; i < 200000; i++) {
[i] = addFive(i);
newArray}
Using the omp
pragma, each each thread executes a copy of the code
within the loop body. The internal OpenMP work queue gives a
chunk-sized block of loop iterations to each thread. When each OpenMP
thread is finished processing its assigned chunks, it retrieves the
next block of loop iterations from the top of the work queue. The
dynamic schedule computes which chunks should be assigned to which
threads at runtime, depending on the work load of each thread.
After that, we sum all elements in newArray
and print the results:
int64_t sum = 0;
for(int i=0 ; i < 200000 ; i++)
+= newArray[i];
sum << sum << "\n" ; cout
Tools like Threadscope for Haskell, or Intel VTune for C++ can be used to visualise the parallel executions across threads. This is a screenshot of the parallel execution of the Haskell parallel strategy.
Haskell
module Main where
import Control.Parallel.Strategies
Strategy,withStrategy,parListChunk,rseq)
(import Data.List (foldl')
parMapChunked :: Strategy b -- ^ evaluation degree at each element
-> Int -- ^ chunk size
-> (a -> b) -- ^ function to apply to each element
-> [a] -- ^ input list
-> [b]
=
parMapChunked strat i f . map f
withStrategy (parListChunk i strat)
= (+5)
addFive
= do
main let length = 200000
= 1000
chunkSize
-- dynamic chunked parallelism
= parMapChunked rseq chunkSize addFive [0..length-1]
newList
-- accumulate the result
sum = foldl' (+) 0 newList
print sum
C++
#include <iostream>
using namespace std;
#include <stdint.h>
#include "omp.h"
int addFive(int i) { return i + 5; }
int main()
{
const int length = 200000;
const int chunkSize = 1000;
int newArray[length];
/* dynamic chunked parallelism */
#pragma omp for schedule(dynamic, chunkSize)
for(int i=0 ; i < length ; i++) {
[i] = addFive(i);
newArray}
/* accumulate the result */
int64_t sum = 0;
for(int i=0 ; i < length ; i++)
+= newArray[i];
sum
<< sum << "\n" ;
cout }