Go to the first, previous, next, last section, table of contents.

GranSim Profiles

This chapter describes the contents of a GranSim profile (a `.gr' file). In most cases the profiles generated by the visualisation tools should provide sufficient information for tuning the performance of the program. However, it is possible to extract more information out of the generated GranSim profile. This chapter provides information how to do that.

Types of GranSim Profiles

Depending on some runtime-system options different kind of profiles are generated:

A reduced GranSim profile contains in the body component only END events. This is sufficient to extract granularity profiles, but it is not sufficient to generate activity profiles. This is the default setting for GranSim profiles.
A full GranSim profile contains one line for every major event in the system (@xref{Contents of a Granularity Profile}). A generation of such a profile is enabled by the RTS option -bP.
A spark profile additionally contains events related to sparks (for creating, using, pruning, exporting, acquiring sparks). Such a profile is generated when using the RTS option -bs.
A heap profile additionally contains events for allocating heap. Such a profile is generated when using the RTS option -bh.

Contents of a GranSim Profile

This section describes the syntactic structure of a GranSim profile.

Header

The header contains general information about the execution. It is split into several sections separated by lines only consisting of - symbols. The end of the header is indicated by a line only consisting of + symbols.

The sections of the header are:

Name of the program, arguments and start time.
General parameters describing the parallel architecture. This covers the number of processors, flags for thread migration, asynchronous communication etc. Finally, this section describes basic costs of the parallel machine like thread creation time, context switch time etc.
Communication parameters describing basic costs for sending messages like latency, message creation costs etc.
Instruction costs describing the costs of different classes of machine operations.

Body

The body of the GranSim profile contains events that are generated during the execution of the program. The following subsections first describe the general structure of the events and then go into details of several classes of events.

General Structure of an Event

. . .

Each line in the body of a GranSim profile represents one event during the execution of the program. The general structure of one such line is:

The keyword PE.
The processor number where the event happened.
The time stamp of the event (in square brackets).
The name of the event.
The thread id of the affected thread (a hex number).
Optionally a node as an additional argument to the event (e.g. the node to be reduced in case of a START event). This is either a hex number or the special string ______ indicating a Nil-closure.
Additional information depending on the event. This can be the processor from which data is fetched or the length of the spark queue after starting a new thread.

The fields are separated by whitespace. A : symbol must follow the time stamp (which must be in sqare brackets).

END Events

END events are an exception to this general structure. The reason for their special structure is that they summarise the most important information about the thread. Therefore, information about e.g. the granularity of the threads can be extracted out of END events alone without having to generate a full GranSim profile.

The structure of an END event is:

The keyword PE.
The processor number where the event happened.
The time stamp of the event (in square brackets).
The name of the event (END in this case).
The keyword SN followed by the spark name of the thread. This information allows to associate a thread with its static spark site in the program (See section Parallelism Annotations, on how to give names to spark sites.)
The keyword ST followed by the start time of the thread.
The keyword EXP followed by a flag indicating whether this thread has been exported to another processor or has been evaluated locally (possible values are T and F).
The keyword BB followed by the number of basic blocks that have been executed by this thread.
The keyword HA followed by the number of heap allocations.
The keyword RT followed by the total runtime. This is the most important information in an END event. It is used by the visualisation tools for generating granularity profiles.
The keyword BT followed by the total block time and a block count (i.e. how often the thread has been blocked).
The keyword FT followed by the total fetch time and a fetch count (i.e. how often the thread fetched remote data).
The keyword LS followed by the number of local sparks (sparks that have to be executed on the local processor) generated by the thread.
The keyword GS followed by the number of global sparks generated by the thread.
The keyword EXP followed by a flag indicating whether this thread was mandatory or only advisable (in the current version this flag is not used; it would be important in a combination of GranSim with a concurrent set-up).

Basic Thread Events

. . . . . . .

The main events directly related to threads are:

START: Generated when starting a thread (after adding overhead for thread creation to the clock of the current processor). After the thread id it has two additional fields: one specifying the node to be evaluated (as a hex number) and the spark site that generated this thread (format: [SN n] where n is a dec number).
START(Q): Same as START but the new thread is put into the runnable queue rather than being executed (only if the current processor is busy at that point).
BLOCK: A thread is blocked on data that is under evaluation by another thread. It is descheduled and put into the blocking queue of that node. Two additional fields contain the node on which the thread is blocked and the processor on which this node is lying (format: (from n) where n is a processor (dec) number).
RESUME: Continue to execute the thread after it has been blocked or has been waiting for remote data. This event does not contain additional fields.
RESUME(Q): Same as RESUME but the new thread is put into the runnable queue rather than being executed (only if the current processor is busy at that point).
SCHEDULE: The thread is scheduled on the given processor (no additional fields). This event is usually emitted after terminating a thread on the processor. It may also occur after a FETCH (if asynchronous communication is turned on) or after a BLOCK event.
DESCHEDULE: The thread is descheduled on the given processor (no additional fields). After this event the thread is in the runnable queue. This event is not used for implicit descheduling that is performed after events like BLOCK or FETCH. DESCHEDULE events should only occur if fair scheduling is turned on.

Communication Events

. .

Events that are issued when sending data between processors are:

FETCH: Send a fetch request from the given thread (on the given processor) to another processor. This event has two additional fields: The first field is the node (hex number) that should be fetched. The last field is the processor where this node is lying and from which the data has to be fetched (format: (from n) where n is a processor (dec) number).
REPLY: A reply for a fetch request of the given thread arrived at the given processor. The first additional field contains the node and the last field contains the processor from which it arrived (format: (from n) where n is a processor (dec) number). Note: This event only marks the arrival of the data. It is usually followed by a RESUME or RESUME(Q) event for the thread that asked for the data.

Thread Migration Events

. . .

These events are only produced when thread migration is enabled (-bM):

STEALING: Indicates the stealing of a thread on the given processor. The thread which is being stolen appears in the thread field. One additional field (the last field) indicates which processor is stealing that thread (format: (by n) where n is a processor (dec) number).
STOLEN: Indicates the arrival of a stolen thread on the given processor. Two additional fields show the node which will be evaluated by this thread next. The last field shows from which processor the thread has been stolen (format: (from n) where n is a processor (dec) number). Note: This thread is immediately being executed by the given processor (no RESUME event follows).
STOLEN(Q): Same as STOLEN but the new thread is put into the runnable queue rather than being executed (only if the current processor is busy at that point).

Spark Events

. . . . .

When enabling spark profiling, events related to sparks will appear in the profile:

SPARK: Indicates the generation of a spark on the given processor for the given node. At that point it is added to this processor's spark pool. Two additional fields show the node to which this spark is pointing and the current size of the spark pool (format: [sparks n] where n is a dec number).
SPARKAT: Same as SPARK but with explicit placement of the spark on this processor. This is usually achieved in the program by using a parLocal or parAt rather than a parGlobal annotation (see section Parallelism Annotations).
USED: Indicates that this spark is turned into a thread on the given processor. A START or START(Q) event will follow soon afterwards.
PRUNED: A spark is removed from the spark pool of the given processor. This might occur when the spark points to a normal form (there is no work to do for that spark). This is checked when creating a spark and when searching the spark pool for new work.
EXPORTED: A spark is exported from a given processor. Two additional fields show the node to which this spark is pointing and the current size of the spark pool (format: [sparks n] where n is a dec number).
ACQUIRED: A spark that has been exported by another proceessor is acquired by the given processor. Two additional fields have the same meaning as for EXPORTED.

Debugging Events

. .

Certain debug options generate additional events that allow to monitor the internal behaviour of the simulator. This information shouldn't be of interest for the friendly user but might come in handy for those who dare hacking at the runtime-system:

SYSTEM_START: Indicates that the simulator is executing a "system" routine (a routine in the runtime-system that is not directly related to graph reduction). This allows to show when exactly rescheduling is done in the simulator. It may be useful in GranSim-Light to check that the costs during system operations are attached to the right thread.
SYSTEM_END: See previous event. From this point on normal graph reduction is performed.

The Emacs GranSim Profile Mode

. . .

Looking up information directly in a GranSim profile is very tedious (believe me, I have done it quite often). To make this task easier the GranSim Toolbox contains a GNU Emacs mode for GranSim profiles: the GranSim Profile Mode.

The most useful features (IMNSHO) are highlighting of parts of a GranSim profile and narrowing of the profile to specific PEs, threads, events etc.

Installation

To use this mode just put the file `GrAnSim.el' somewhere on your Emacs load-path and load the file. I don't have autoload support at the moment, but the file is very short anyway, so directly loading it is quite fast. Currently, the mode requires the hilit19 package for highlighting parts of the profile. It also requires the `tf' script in the bin dir of your GranSim installation.

I use Emacs 19.31 with the default `hilit19.el' package, but the GranSim profile mode has been successfully tested with Emacs 19.27. However, if you have problems with the mode please report it to the address shown at the end of this document (see section Bug Reports).

Customisation

A few Emacs variables control the behaviour of the GranSim Profile mode:

Variable: gransim-auto-hilit: . This variable indicates whether highlighting is turned on by default. Note that you can customise `hilit19' such that it does not automatically highlight buffers that are bigger than a given size. Since GranSim profiles tend to be extremely large you might want to reduce the default value.

Variable: grandir: . The root of the GranSim installation. The mode searches for scripts of the GranSim Toolbox in the directory grandir/bin. By default this variable is set to the contents of the environment variable GRANDIR.

Variable: hwl-hi-node-face: . Face to be used for specific highlighting of a node.

Variable: hwl-hi-thread-face: . Face to be used for specific highlighting of a thread.

Here are the hilit19 variables that are of some interest for the GranSim Profile Mode:

Variable: hilit-auto-highlight: . T if we should highlight all buffers as we find 'em, nil to disable automatic highlighting by the find-file hook. Default value: t.

Variable: hilit-auto-highlight-maxout: . Auto-highlight is disabled in buffers larger than this. Default value: 60000.

Features

. . . .

The main features of the GranSim profile mode are:

Highlighting of parts of the profile. Colour coding is used to distinguish between events that start a reduction, finish a reduction and block a reduction. Within END events the total runtime is specially highlighted.
Narrowing of the profile. This should not be confused with the narrowing mode in Emacs. The narrowing in GranSim profile mode is done by running a script (`tf') over the buffer and displaying the output in another buffer. Hence, narrowing can be further refined be improving the `tf' script, which is written in Perl. It is possible to narrow a GranSim profile to a specific
- processor (PE),
- event,
- thread,
- node,
- spark (only possible for spark profiles)
This feature is particularly useful to e.g. follow a node, which has been moved between processors or to concentrate on the reductions on one specific processor. Of course, for those pagans, who don't believe in Emacs it is also possible to run the `tf' script directly on a `.gr' file.
A second form of highlighting, specialised for nodes and threads is available, too. With the commands hwl-hi-thread and hwl-hi-node every occurrence of the thread or node in the profile after the current point is highlighted. The function hwl-hi-clear undoes all such highlighting.
There is a menu item for calling most of the functions described here. It automatically appears in any GranSim profile (i.e. any file that has a `.gr' extension).

Default key bindings in GranSim profile mode:

: C-c t, M-x hwl-truncate: . Truncate event lines such that exactly one line is shown for one event in the body of a profile.

: C-c w, M-x hwl-wrap: . Wrap lines to show them in full length.

: C-c , M-x hwl-toggle-truncate-wrap: . Toggle between the above two modes.

: C-c h, M-x hilit-rehighlight-buffer: . Rehighlight the whole buffer.

: C-c p, M-x hwl-narrow-to-pe: . Narrow the profile to a PE.

: C-c t, M-x hwl-narrow-to-thread: . Narrow the profile to a thread.

: C-c e, M-x hwl-narrow-to-event: . Narrow the profile to an event.

: C-c C-e, (lambda () (hwl-narrow-to-event "END")): . Narrow the profile to an END event.

: C-c , M-x hwl-toggle-truncate-wrap: . Toggle between the above two modes.

: C-c N, M-x hwl-hi-node: . Highlight a node in the profile.

: C-c T, M-x hwl-hi-thread: . Highlight a thread in the profile.

: C-c C-c, M-x hwl-hi-clear: . Remove highlightings of nodes and threads.

Go to the first, previous, next, last section, table of contents.