Go to the first, previous, next, last section, table of contents.


Visualisation Tools

The visualisation tools that come together with GranSim take a GranSim profile as input and create level 2 PostScript files showing either activity or the granularity of the execution. A collection of scripts in the GranSim Toolbox allows to focus on specific aspects of the execution like the `node flow' i.e. the movement of nodes (closures) between the PEs.

Most of these tools are implemented as Perl scripts. This means that they are very versatile and it should be easy to modify them. However, processing a large GranSim profile can involve a quite big amount of computation (and of memory). Speeding up crucial parts of the scripts is on my ToDo-list but better don't hold your breath.

Activity Profiles

. . .

Three tools allow to show the activity during the execution in three levels of detail:

All tools discussed in this section print a help message when called with the option -h. This message shows the available options. In general, all tools understand the option -o <file> for specifying the output file and -m for generating a monochrome profile (by default the tools generate colour PostScript).

The `gr2ps' and `gr2ap' scripts work in two stages:

  1. First the `.gr' file is translated into a `.qp' file, which is basically a stripped down version of a GranSim profile.
  2. Then the `.qp' file is translated into a `.ps' file.

The `gr2pe' script works directly on the `.gr' file.

Overall Activity Profile

The overall activity profile (created via gr2ps) shows the activity of the whole machine by separating the threads into up to five different groups. These groups describe the number of

If the GranSim profile includes information about sparks (-bs option) it is also possible to show the number of sparks. However, this number is usually much bigger than the number of all threads. Therefore, it doesn't make much sense mixing the groups for sparks and threads in one profile.

The option -O reduces the size of the generated PostScript file. The option -I <string> is used to specify which groups of threads should be shown in which order. For example

gr2ps -O -I "arb" parfib.gr

generates a profile `parfib.ps' showing only active (`a'), runnable (`r') and blocked (`b') threads. The letter code for the other groups are `f' for fetching, `m' for migrating and `s' for sparks.

In the current version the marks on the y-axis of the generated profile may be stretched or compressed. This might happen if many events occur at exactly the same time. If this is the case, the initial count of the maximal number of y-values may be wrong causing a rescaling at the very end. In practice that happens rarely (more often for GranSim-Light profiles, though).

The picture below shows an overall activity profile for a simple parallel divide-and-conquer program. The header of the graph shows the runtime-system options for the execution.

@centerline{@psfig{angle=90,file=parbonzo.ps,width=@hsize}}

The overall runtime is measured in machine cycles. The average parallelism is the area covered by the running (green or medium-gray) threads, normalised with respect to the total runtime. In this graph only three groups show up: Most of the time 16 threads are running, utilising all available processors with occasional dips in the green area. The big amber (or light-gray) area of runnable threads indicates that this program can easily use all available processors. The large amount of blocking indicated by the large red (or black) area in the graph is caused by nodes near the root in the computation tree. They have to wait for the results of their children to combine them into the overall result. The sequential part at the beginning of the computation is due to I/O overhead including the initialisation of the basic I/O monad. Towards the end of the computation, when combining results near the root of the computation tree, the overall utilisation drops significantly. In this setup the latency is so small that no large areas of fetching (blue) threads appear. Also migration is turned of in this case (it can be turned on with the runtime system option -bM). The picture below shows an overall activity profile for a simple parallel divide-and-conquer program. The header of the graph shows the runtime-system options for the execution.

Overall activity profile

The overall runtime is measured in machine cycles. The average parallelism is the area covered by the running (green or medium-gray) threads, normalised with respect to the total runtime. In this graph only three groups show up: Most of the time 16 threads are running, utilising all available processors with occasional dips in the green area. The big amber (or light-gray) area of runnable threads indicates that this program can easily use all available processors. The large amount of blocking indicated by the large red (or black) area in the graph is caused by nodes near the root in the computation tree. They have to wait for the results of their children to combine them into the overall result. The sequential part at the beginning of the computation is due to I/O overhead including the initialisation of the basic I/O monad. Towards the end of the computation, when combining results near the root of the computation tree, the overall utilisation drops significantly. In this setup the latency is so small that no large areas of fetching (blue) threads appear. Also migration is turned of in this case (it can be turned on with the runtime system option @t{-bM}).

Per-processor Activity Profile

The idea of the per-processor activity profile is to show the most important pieces of information about each processor in one graph. Therefore, it is easy to compare the behaviour of the different processors and to spot imbalances in the computation.

This profile shows one strip for each of the simulated processors. Each of these strips encodes three kinds of information:

This script also allows to produce variants of the same kind of graph that focus on different features of GranSim:

No more than about 32 processors should be shown in one graph otherwise the strips are getting too small. This profile can not be generated for a GranSim-Light profile.

The graph below shows a per-processor activity profile for the same parallel divide-and-conquer program as in the previous section.

@centerline{@psfig{angle=90,file=parbonzo-pe.ps,width=@hsize}}

The graph shows the activity of each of the 16 processors in this simulation. The dark green areas in the first third of the computation show that processors 2, 4 and 14 have a significantly higher load of runnable threads than the rest. Also note the pattern that a decreasing number of blocked threads (thinner blue bar) is accompanied by an increasing number of runnable threads (darker green area).

The graph below shows a per-processor activity profile for the same parallel divide-and-conquer program as in the previous section.

Per-processor activity profile for parfib

The graph shows the activity of each of the 16 processors in this simulation. The dark green areas in the first third of the computation show that processors 2, 4 and 14 have a significantly higher load of runnable threads than the rest. Also note the pattern that a decreasing number of blocked threads (thinner blue bar) is accompanied by an increasing number of runnable threads (darker green area).

Per-Thread Activity Profile

The per-thread activity profile shows the activity of all generated threads. For each thread a horizontal line is shown. The line starts when the thread is created and ends when it is terminated. The thickness of the line indicates the state of the thread. The possible states correspond to the groups shown in the overall activity profile (see section Overall Activity Profile).

The states are encoded in the following way:

This profile gives the most accurate kind of information and it often allows to `step through' the computation by relating events on different processors with each other. For example the typical pattern at the beginning of the computation is some short computation for starting the thread followed by fetching remote data. After that the thread may become runnable if another thread has been started in the meantime.

The picture below shows an example of a per-thread activity profile. Note the short period of fetching immediately after starting a thread in order to get the data for the spark that has just been turned into a thread. The high degree of suspension is mainly due to the fact that migration is turned off in this example.

@centerline{@psfig{angle=90,file=pm-ap.ps,width=@hsize}}

The picture below shows an example of a per-thread activity profile. Note the short period of fetching immediately after starting a thread in order to get the data for the spark that has just been turned into a thread. The high degree of suspension is mainly due to the fact that migration is turned off in this example.

Per-thread activity profile

However, such a detailed analysis is only possible for programs with a rather small number of threads. Usually, GranSim profiles of bigger executions have to be pre-processed to reduce the number of threads that are shown on one graph (see section Scripts). As the level of detail provided by this graph is rarely needed for bigger executions no automatic splitting of a profile into several graphs has been implemented.

Granularity Profiles

The tools for generating granularity profiles aim at showing the relative sizes of the generated threads. Especially the number of tiny threads, for which the overhead of thread creation is relatively high is of interest.

All tools discussed in this section require Gnuplot to generate the granularity profiles. I am using version 3.5 but it should work with older versions, too.

For showing granularity basically two kinds of graphs can be generated:

The main tools for generating such graphs are:

Bucket Graphs

. .

In a bucket graph the x-axis indicating execution times of the threads is partitioned into intervals (dfn{buckets}). The graph shows a histogram of the number of threads in each bucket (i.e. whose execution time falls into this interval). For generating this kind of graph only a restricted GranSim profile (containing only END events) is required.

For example one of the files generated by running

gr2gran -t , pf.gr

is g.ps, which contains such a bucket statistics. The -t option of this tool selects the right template file (, is a shorthand for the global template in $GRANDIR/bin).

Here is the bucket statistics of executing parfib 22:

@centerline{@psfig{angle=270,file=pf-bp0-g.ps,width=@hsize}}

It shows that this program creates a huge number of tiny threads (note the log scale in the graph). Refining the intervals for these tiny threads further gives the following bucket statistics

@centerline{@psfig{angle=270,file=pf-bp0-g1.ps,width=@hsize}}

Here is the bucket statistics of executing @t{parfib 22}:

Bucket statistics

It shows that this program creates a huge number of tiny threads (note the log scale in the graph). Refining the intervals for these tiny threads further gives the following bucket statistics

Bucket statistics

The necessary change in the template file for this bucket statistics is

-- Intervals for pure exec. times
G: (100, 200, 500, 1000, 2000, 5000, 10000)

Cumulative Graphs

.

In a cumulative graph the x-axis again represents execution times of the individual threads. The value in the graph at the time t represents the number of threads whose execution time is smaller than t. Therefore, the values in the graph are monotonically increasing until the right end shows the total number of threads in the execution.

Again, running

gr2gran -t , pf.gr

generates cumulative graphs for the runtime in the files cumu-rts.ps and cumu-rts0.ps (one file shows absolute numbers of threads, the other the percentage of the threads on the y-axis).

Here is the cumulative runtime statistics of executing parfib 22:

@centerline{@psfig{angle=270,file=pf-bp0-cumu-rts0.ps,width=@hsize}}

Here is the cumulative runtime statistics of executing @t{parfib 22}:

Cumulative runtime statistics

Template Files

The functions for reading template files can be found in `template.pl'. This file also contains documentation about the available fields.

Statistics Packages

A set of statistics functions for computing mean value, standard deviation, correlation coefficient etc can be found in `stats.pl'.

Scripts

The GranSim Toolbox contains not only visualisation tools but also a set of scripts that work on GranSim profiles and provide specific information.


Go to the first, previous, next, last section, table of contents.