Go to the first, previous, next, last section, table of contents.
This chapter describes the contents of a GranSim profile (a
`.gr' file). In most cases the profiles generated by the
visualisation tools should provide sufficient information for tuning the
performance of the program. However, it is possible to extract more
information out of the generated GranSim profile. This chapter provides
information how to do that.
Depending on some runtime-system options different kind of profiles are
generated:
-
A reduced GranSim profile contains in the body component only
END events. This is sufficient to extract granularity profiles,
but it is not sufficient to generate activity profiles. This is the
default setting for GranSim profiles.
-
A full GranSim profile contains one line for every major event in
the system (@xref{Contents of a Granularity Profile}). A generation of
such a profile is enabled by the RTS option -bP.
-
A spark profile additionally contains events related to sparks
(for creating, using, pruning, exporting, acquiring sparks). Such a
profile is generated when using the RTS option -bs.
-
A heap profile additionally contains events for allocating heap.
Such a profile is generated when using the RTS option -bh.
This section describes the syntactic structure of a GranSim profile.
The header contains general information about the execution. It is split
into several sections separated by lines only consisting of -
symbols. The end of the header is indicated by a line only consisting of
+ symbols.
The sections of the header are:
-
Name of the program, arguments and start time.
-
General parameters describing the parallel architecture. This
covers the number of processors, flags for thread migration,
asynchronous communication etc. Finally, this section describes basic
costs of the parallel machine like thread creation time, context switch
time etc.
-
Communication parameters describing basic costs for sending
messages like latency, message creation costs etc.
-
Instruction costs describing the costs of different classes of
machine operations.
.
The body of the GranSim profile contains events that are generated
during the execution of the program. The following subsections first
describe the general structure of the events and then go into details of
several classes of events.
.
.
.
Each line in the body of a GranSim profile represents one event during
the execution of the program. The general structure of one such line is:
-
The keyword PE.
-
The processor number where the event happened.
-
The time stamp of the event (in square brackets).
-
The name of the event.
-
The thread id of the affected thread (a hex number).
-
Optionally a node as an additional argument to the
event (e.g. the node to be reduced in case of a START event).
This is either a hex number or the special string ______ indicating
a Nil-closure.
-
Additional information depending on the event. This can be the processor
from which data is fetched or the length of the spark queue after
starting a new thread.
The fields are separated by whitespace. A : symbol must follow the
time stamp (which must be in sqare brackets).
.
END events are an exception to this general structure. The reason
for their special structure is that they summarise the most important
information about the thread. Therefore, information about e.g. the
granularity of the threads can be extracted out of END events alone
without having to generate a full GranSim profile.
The structure of an END event is:
-
The keyword PE.
-
The processor number where the event happened.
-
The time stamp of the event (in square brackets).
-
The name of the event (END in this case).
-
The keyword SN followed by the spark name of the thread. This
information allows to associate a thread with its static spark site in
the program (See section Parallelism Annotations, on how to give names to
spark sites.)
-
The keyword ST followed by the start time of the thread.
-
The keyword EXP followed by a flag indicating whether this thread has
been exported to another processor or has been evaluated locally
(possible values are T and F).
-
The keyword BB followed by the number of basic blocks that
have been executed by this thread.
-
The keyword HA followed by the number of heap allocations.
-
The keyword RT followed by the total runtime. This is the
most important information in an END event. It is used by the
visualisation tools for generating granularity profiles.
-
The keyword BT followed by the total block time and a block count
(i.e. how often the thread has been blocked).
-
The keyword FT followed by the total fetch time and a fetch count
(i.e. how often the thread fetched remote data).
-
The keyword LS followed by the number of local sparks
(sparks that have to be executed on the local processor) generated by the thread.
-
The keyword GS followed by the number of global sparks
generated by the thread.
-
The keyword EXP followed by a flag indicating whether this thread was
mandatory or only advisable (in the current version this flag is
not used; it would be important in a combination of GranSim with a
concurrent set-up).
.
.
.
.
.
.
.
The main events directly related to threads are:
-
START: Generated when starting a thread (after adding overhead for
thread creation to the clock of the current processor). After the thread
id it has two additional fields: one specifying the node to be evaluated
(as a hex number) and the spark site that generated this thread (format:
[SN n] where n is a dec number).
-
START(Q): Same as START but the new thread is put into the
runnable queue rather than being executed (only if the current processor
is busy at that point).
-
BLOCK: A thread is blocked on data that is under evaluation by another
thread. It is descheduled and put into the blocking queue of that node.
Two additional fields contain the node on which the thread is blocked
and the processor on which this node is lying (format: (from
n) where n is a processor (dec) number).
-
RESUME: Continue to execute the thread after it has been blocked or
has been waiting for remote data. This event does not contain additional
fields.
-
RESUME(Q): Same as RESUME but the new thread is put into the
runnable queue rather than being executed (only if the current processor
is busy at that point).
-
SCHEDULE: The thread is scheduled on the given processor (no
additional fields). This event is usually emitted after terminating a
thread on the processor. It may also occur after a FETCH (if
asynchronous communication is turned on) or after a BLOCK event.
-
DESCHEDULE: The thread is descheduled on the given processor (no
additional fields). After this event the thread is in the runnable
queue. This event is not used for implicit descheduling that is
performed after events like BLOCK or FETCH. DESCHEDULE
events should only occur if fair scheduling is turned on.
.
.
Events that are issued when sending data between processors are:
-
FETCH: Send a fetch request from the given thread (on the given
processor) to another processor. This event has two
additional fields: The first field is the node (hex number) that should
be fetched. The last field is the processor where this node is lying and
from which the data has to be fetched (format: (from n) where
n is a processor (dec) number).
-
REPLY: A reply for a fetch request of the given thread arrived at
the given
processor. The first additional field contains the node and the last
field contains the processor from which it arrived (format: (from
n) where n is a processor (dec) number). Note: This event
only marks the arrival of the data. It is usually followed by a
RESUME or RESUME(Q) event for the thread that asked for the data.
.
.
.
These events are only produced when thread migration is enabled
(-bM):
-
STEALING: Indicates the stealing of a thread on the given
processor. The thread which is being stolen appears in the thread
field. One additional field (the last field) indicates which processor
is stealing that thread (format: (by n) where n is a
processor (dec) number).
-
STOLEN: Indicates the arrival of a stolen thread on the given
processor. Two additional fields show the node which will be evaluated
by this thread next. The last field shows from which processor the
thread has been stolen (format: (from n) where n is a
processor (dec) number). Note: This thread is immediately being executed
by the given processor (no RESUME event follows).
-
STOLEN(Q): Same as STOLEN but the new thread is put into the
runnable queue rather than being executed (only if the current processor
is busy at that point).
.
.
.
.
.
When enabling spark profiling, events related to sparks will appear in
the profile:
-
SPARK: Indicates the generation of a spark on the given processor
for the given node. At that point it is added to this processor's spark pool.
Two additional
fields show the node to which this spark is pointing and the current
size of the spark pool (format: [sparks n] where n is a
dec number).
-
SPARKAT: Same as SPARK but with explicit placement of the spark
on this processor. This is usually achieved in the program by using a
parLocal or parAt rather than a parGlobal annotation
(see section Parallelism Annotations).
-
USED: Indicates that this spark is turned into a thread on the given
processor. A START or START(Q) event will follow soon afterwards.
-
PRUNED: A spark is removed from the spark pool of the given
processor. This might occur when the spark points to a normal form
(there is no work to do for that spark). This is checked when
creating a spark and when searching the spark pool for new work.
-
EXPORTED: A spark is exported from a given processor. Two additional
fields show the node to which this spark is pointing and the current
size of the spark pool (format: [sparks n] where n is a
dec number).
-
ACQUIRED: A spark that has been exported by another proceessor is
acquired by the given processor. Two additional fields have the same
meaning as for EXPORTED.
.
.
Certain debug options generate additional events that allow to monitor
the internal behaviour of the simulator. This information shouldn't be
of interest for the friendly user but might come in handy for those who
dare hacking at the runtime-system:
-
SYSTEM_START: Indicates that the simulator is executing a "system"
routine (a routine in the runtime-system that is not directly related to
graph reduction). This allows to show when exactly rescheduling is done
in the simulator. It may be useful in GranSim-Light to check that the
costs during system operations are attached to the right thread.
-
SYSTEM_END: See previous event. From this point on normal graph
reduction is performed.
.
.
.
Looking up information directly in a GranSim profile is very tedious
(believe me, I have done it quite often). To make this task easier the
GranSim Toolbox contains a GNU Emacs mode for GranSim profiles: the
GranSim Profile Mode.
The most useful features (IMNSHO) are highlighting of parts of a GranSim
profile and narrowing of the profile to specific PEs, threads, events etc.
To use this mode just put the file `GrAnSim.el' somewhere on your
Emacs load-path and load the file. I don't have autoload support
at the moment, but the file is very short anyway, so directly loading it
is quite fast. Currently, the mode requires the hilit19 package
for highlighting parts of the profile. It also requires the `tf'
script in the bin dir of your GranSim installation.
I use Emacs 19.31 with the default `hilit19.el' package, but the
GranSim profile mode has been successfully tested with Emacs
19.27. However, if you have problems with the mode please report it to
the address shown at the end of this document (see section Bug Reports).
A few Emacs variables control the behaviour of the GranSim Profile mode:
- Variable: gransim-auto-hilit
- .
This variable indicates whether highlighting is turned on by default.
Note that you can customise `hilit19' such that it does not
automatically highlight buffers that are bigger than a given size.
Since GranSim profiles tend to be extremely large you might want to
reduce the default value.
- Variable: grandir
- .
The root of the GranSim installation. The mode searches for scripts of
the GranSim Toolbox in the directory grandir/bin.
By default this variable is set to the contents of the environment
variable
GRANDIR
.
- Variable: hwl-hi-node-face
- .
Face to be used for specific highlighting of a node.
- Variable: hwl-hi-thread-face
- .
Face to be used for specific highlighting of a thread.
Here are the hilit19 variables that are of some interest for the GranSim
Profile Mode:
- Variable: hilit-auto-highlight
- .
T if we should highlight all buffers as we find 'em, nil to disable
automatic highlighting by the find-file hook.
Default value: t.
- Variable: hilit-auto-highlight-maxout
- .
Auto-highlight is disabled in buffers larger than this.
Default value: 60000.
.
.
.
.
The main features of the GranSim profile mode are:
-
Highlighting of parts of the profile. Colour coding is used to
distinguish between events that start a reduction, finish a reduction
and block a reduction. Within END events the total runtime is
specially highlighted.
-
Narrowing of the profile. This should not be confused with the
narrowing mode in Emacs. The narrowing in GranSim profile mode is done
by running a script (`tf') over the buffer and displaying the
output in another buffer. Hence, narrowing can be further refined be
improving the `tf' script, which is written in Perl.
It is possible to narrow a GranSim profile to a specific
-
processor (PE),
-
event,
-
thread,
-
node,
-
spark (only possible for spark profiles)
This feature is particularly useful to e.g. follow a node, which has
been moved between processors or to concentrate on the reductions on one
specific processor.
Of course, for those pagans, who don't believe in Emacs it is also
possible to run the `tf' script directly on a `.gr' file.
-
A second form of highlighting, specialised for nodes and threads
is available, too. With the commands
hwl-hi-thread
and
hwl-hi-node
every occurrence of the thread or node in the profile
after the current point is highlighted. The function hwl-hi-clear
undoes all such highlighting.
-
There is a menu item for calling most of the functions described
here. It automatically appears in any GranSim profile (i.e. any file
that has a `.gr' extension).
Default key bindings in GranSim profile mode:
- : C-c t,
M-x hwl-truncate
- .
Truncate event lines such that exactly one line is shown for one event
in the body of a profile.
- : C-c w,
M-x hwl-wrap
- .
Wrap lines to show them in full length.
- : C-c ,
M-x hwl-toggle-truncate-wrap
- .
Toggle between the above two modes.
- : C-c h,
M-x hilit-rehighlight-buffer
- .
Rehighlight the whole buffer.
- : C-c p,
M-x hwl-narrow-to-pe
- .
Narrow the profile to a PE.
- : C-c t,
M-x hwl-narrow-to-thread
- .
Narrow the profile to a thread.
- : C-c e,
M-x hwl-narrow-to-event
- .
Narrow the profile to an event.
- : C-c C-e,
(lambda () (hwl-narrow-to-event "END"))
- .
Narrow the profile to an END event.
- : C-c ,
M-x hwl-toggle-truncate-wrap
- .
Toggle between the above two modes.
- : C-c N,
M-x hwl-hi-node
- .
Highlight a node in the profile.
- : C-c T,
M-x hwl-hi-thread
- .
Highlight a thread in the profile.
- : C-c C-c,
M-x hwl-hi-clear
- .
Remove highlightings of nodes and threads.
Go to the first, previous, next, last section, table of contents.