Table: Summary of Experiments
Experiment |
Clues |
Data Collected |
---|---|---|
fpe |
High system time. Presence of floating point operations. |
All floating-point exceptions, with the exception type and the
call stack at the time of the exception. |
hwc |
High user CPU time. |
Counts at the source line, machine instruction, and function levels of various hardware events, including: clock cycles, graduated instructions, primary instruction cache misses, secondary instruction cache misses, primary data cache misses, secondary data cache misses, translation lookaside buffer (TLB) misses, and graduated floating-point instructions. PC sampling is used. |
hwcsamp | High user CPU time. | Similar to pcsamp experiment, except that
up to six (6) hardware counter event are read in addition to the
program counter. |
hwctime |
High user CPU time. |
Similar to hwc experiment, except that callstack sampling is used. |
io |
I/O-bound. |
Times the following I/O system calls: read, readv,
write, writev, open, close,
dup, pipe, creat. The
time reported is wall clock time. |
iot | I/O-bound. | Traces and times the following
I/O system calls: read, readv, write, writev,
open, close, dup, pipe, creat.
The
time reported is wall clock time. Output can optionally
be a time sorted line of output per each I/O system call. |
mpi |
MPI performance is poor. |
Times calls to various MPI routines. The time
reported is wall clock time. |
mpiotf |
MPI performance is poor. |
Traces and times calls to various MPI routines and creates
Open Trace Format (OTF) files. Output is Open Trace Format
(OTF) files that need to be read by another tool, such as Vampir-NG.
All calls are accounted for - no sampling. . The time
reported is wall clock time. |
mpit | MPI performance is poor |
Traces and times calls to
various MPI routines. Output can optionally be a line of
trace per MPI call. All calls are accounted for - no
sampling. The time reported is wall clock time. |
pcsamp |
High user CPU time. |
Actual CPU time at the source line, machine instruction, and function levels by sampling the program counter at 10 or 1-millisecond intervals. |
usertime |
Slow program, nothing else known. Not CPU-bound. |
Inclusive and exclusive CPU time for each function by sampling the callstack at 30-millisecond intervals. |