Algorithm + Strategy = Parallelism -- Accident Blackspots

Go backward to Linsolv
Go up to Large Parallel Applications

Accident Blackspots

The UK Centre for Transport Studies requires to analyse police accident records to discover accident blackspots, i.e. places where a number of accidents occurred. Several criteria are used to determine whether two accident reports are for the same location. Two accidents may be at the same location if they occurred at the same junction number, at the same pair of roads, at the same grid reference, or within a small radius of each other. The problem amounts to partitioning a set into equivalence classes under several equivalence relations.

The algorithm used is as follows. For each of the matching criteria an index is constructed over the set of accidents. The indices are used to construct an indexed, binary same-site relation that pairs accidents occurring at the same location. The partition is obtained by repeatedly choosing an accident and finding all of the accidents reachable from it in the same-site relation [Trinder et al., 1996].

Fine-grained Version

The first parallel version uses fine-grained parallelism, and has four stages in the top-level pipeline: reading and parsing the file of accidents; constructing the criteria indices over the set of accidents; constructing the indexed same-site relation; and forming the partition. Little parallelism is gained from this top-level pipeline (a speedup of 1.2) because partitioning depends on the same-site index, and constructing the same-site relation depends on the criteria indices and the first value cannot be read from an index (or tree) until all of the index has been constructed. The individual pipeline stages are parallelised using a variety of techniques. The file reading and parsing stage is made data parallel by partitioning the data and reading from n files.

nFiles = 4

main =  readn nFiles []

readn n cts | n > 0 = 
    readFile ("/path/accident"++show n)
    (\ioerror -> complainAndDie)
    (\ctsn -> readn (n-1) (ctsn:cts))
        
readn 0 cts =
  let accidents = concat (map parse8Tuple cts `using` parList rnf) 
  in ...

Control parallelism is used to construct the three criteria indices.

mkAccidentIxs :: [Accident] -> AccidentIxs
mkAccidentIxs accs = (jIx,neIx,rpIx) `demanding` strategy
  where
    jIx  = ...
    neIx = ...
    rpIx = ...
    strategy = rnf jIx  `par` 
               rnf neIx `par`
               rnf rpIx `par` ()

The pipeline stages constructing the same-site relation and the partition both use benign speculative parallelism. For partitioning, the equivalence classes of n, 20 say, accidents are computed in parallel. If two or more of the accidents are in the same class, some work is duplicated. The chance of wasting work is small as the mean class size is 4.4, and there are approximately 7,500 accidents. The speculation is benign because the amount of work performed by a speculative task is small, and no other threads are sparked.

mkPartition :: Set Accident -> IxRelation2 Accident Accident -> 
               Set (Set Accident)
mkPartition accs ixRel =
  case (length aList) of 
    0         -> emptySet
    n         -> (mkSet matchList `union` mkPartition rest ixRel)
                 `demanding` strategy
    otherwise -> ...
    where
      aList = take n (setToList accs)
      matchList = [mkSet (reachable [a] ixRel) | a <- aList]
      rest     = minusManySet accs matchList
      strategy = parList rnf matchList

On four processors the fine-grained program achieves an average parallelism of 3.5 in an idealised simulation. Unfortunately average parallelism falls to 2.3 for the simulated target machine because thread granularity is small and data locality is poor.

Coarse-grained Version

The second version of the program partitions the data geographically into a number of tiles using the grid references. Each tile has an overlap with its neighbours to capture multiple-accident sites that span the borders. Each area is partitioned in parallel and duplicated border sites are eliminated. There are currently just four tiles: top left, top right, bottom left and bottom right; and the strategy is trivial:

strategy = 
  rnf tlPartition `par`         
  rnf trPartition `par`         
  rnf blPartition `par`         
  rnf brPartition

The advantages of this simple, coarse-grained approach are excellent thread granularity and data locality. On four processors an average parallelism of 3.7 is achieved for both idealised and realistic simulations. The program is at an early stage of tuning on a shared-memory Sun SPARCserver with 4 Sparc 10 processors, and is already delivering wall-clock speedups of 2.2 over the sequential version compiled with full optimisation. The sequential Haskell version is already an order of magnitude faster than the interpreted PFL version constructed at the Centre for Transport Studies [Trinder et al., 1996]. Evaluation strategies facilitated experiments with many different types of parallelism in this application.

{trinder,hwloidl,simonpj}@dcs.gla.ac.uk, kh@dcs.st-and.ac.uk