Go to the first, previous, next, last section, table of contents.


\def\line{\hbox to\hsize} \def\centerline#1{\line{\hss#1\hss}} %\input epsf \input psfig %\def\psfigurepath{/where/your/files/live} \input texinfo

GranSim User's Guide Version 0.03

July 1996

Hans-Wolfgang Loidl hwloidl@dcs.gla.ac.uk Copyright (C) 1994 -- 1996, Hans-Wolfgang Loidl for the GRASP/AQUA Project, Glasgow University

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

This user's guide describes how to use GranSim for simulating the parallel execution of (annotated) Haskell programs. In passing we will discuss how to write parallel, lazy functional programs and how to tune their performance. To this end, some visualisation tools for generating activity and granularity profiles of the execution will be discussed. A set of example programs demonstrates the use of GranSim.

GranSim is part of the Glasgow Haskell Compiler (GHC), in fact it is a special setup of GHC, which uses a slightly modified compiler (for instrumenting the code) and an extended runtime-system. For users who are already familiar with the GHC and parallel functional programming in general there is a quick introduction to GranSim available (see section A Quick Introduction to GranSim).

A Quick Introduction to GranSim

If you already know how to compile a Haskell program with GHC and if you have an installed version of GranSim available there are only a few changes necessary to simulate parallel execution. Basically, a compile time flag has to be used to generate instrumented code. Runtime-system flags then control the behaviour of the simulation.

  1. Compile all modules with the additional options -gransim and -fvia-C. Use -gransim also when linking object files. For example
    ghc -gransim -fvia-C -o foo foo.hs
    
    creates a GranSim executable file foo.
  2. When running the program use the runtime-system option -bP to generate a full GranSim profile (see section Types of GranSim Profiles). See section Runtime-System Options for a description of all options that allow you to control the behaviour of the simulated parallel architecture. For example
    ./foo +RTS -bP -bp16 -bl400
    
    starts a simulation for a machine with 16 processors and a latency of 400 machine cycles. It generates a GranSim profile `foo.gr' .
  3. Use one of the visualisation tools (see section Visualisation Tools) to examine the behaviour of the program. The first bet is to use `gr2ps', which generates a graph showing the overall activity of the machine in a global picture. For example
    gr2ps -O foo.gr
    
    generates an activity profile as a colour PostScript file `foo.ps'. It shows how many threads in total have been running, runnable (but not running), blocked (on data under evaluation), fetching (remote data) and migrating (to another processor) at each point during the execution. Other tools you might want to try are `gr2pe' (giving a per-PE activity profile) and `gr2ap' (giving a per-thread activity profile). Additionally, another set of visualisation tools allows to focus on the granularity of the generated threads. The most important one is `gr2gran', which generates bucket statistics showing the runtime of the individual threads (see section Granularity Profiles).

As an example for an overall activity profile the graph below shows the result of running a parfib program (see section A Simple Example Program) on 16 processors with a latency of 400 cycles.

@centerline{@psfig{angle=90,file=pf-bp16-bl400.ps,width=@hsize}}

Overall, for this simple program the utilisation is almost perfect as the green (medium-gray) area reaches up to 16 almost through the whole computation. Only at the end of the computation there is a significant number of runnable (amber or light-gray) and blocked (red or black) threads.

The header of the picture shows the average parallelism and the options used for this execution (e.g. the -bp16 part shows that 16 processors have been simulated). The runtime shown in the footer of the picture is measured in machine cycles. In GranSim all times are given in machine cycles. As an example for an overall activity profile the graph below shows the result of running a parfib program (@pxref{Example}) on 16 processors with a latency of 400 cycles.

Overall activity profile for parfib

Overall, for this simple program the utilisation is almost perfect as the green (medium-gray) area reaches up to 16 almost through the whole computation. Only at the end of the computation there is a significant number of runnable (amber or light-gray) and blocked (red or black) threads. The header of the picture shows the average parallelism and the options used for this execution (e.g. the @t{-bp16} part shows that 16 processors have been simulated). The runtime shown in the footer of the picture is measured in machine cycles. In GranSim all times are given in machine cycles.


Go to the first, previous, next, last section, table of contents.