Available Experiments

Available Experiments

Table: Summary of Experiments

Experiment

Clues

Data Collected

fpe

High system time. Presence of floating point operations.

All floating-point exceptions, with the exception type and the call stack at the time of the exception.

hwc

High user CPU time.

Counts at the source line, machine instruction, and function levels of various hardware events, including: clock cycles, graduated instructions, primary instruction cache misses, secondary instruction cache misses, primary data cache misses, secondary data cache misses, translation lookaside buffer (TLB) misses, and graduated floating-point instructions.  PC sampling is used.  

hwcsamp High user CPU time. Similar to pcsamp experiment, except that up to six (6) hardware counter event are read in addition to the program counter.

hwctime

High user CPU time.

Similar to hwc experiment, except that callstack sampling is used.  

io

I/O-bound.

Times the following I/O system calls: read, readv, write, writev, open, close, dup, pipe, creat.   The time reported is wall clock time.

iot I/O-bound. Traces and times the following I/O system calls: read, readv, write, writev, open, close, dup, pipe, creat.   The time reported is wall clock time.   Output can optionally be a time sorted line of output per each I/O system call.

mpi

MPI performance is poor.

Times calls to various MPI routines.   The time reported is wall clock time.

mpiotf

MPI performance is poor.

Traces and times calls to various MPI routines and creates Open Trace Format (OTF) files.   Output is Open Trace Format (OTF) files that need to be read by another tool, such as Vampir-NG. All calls are accounted for - no sampling. .   The time reported is wall clock time.

mpit MPI performance is poor
Traces and times calls to various MPI routines.   Output can optionally be a line of trace per MPI call.  All calls are accounted for - no sampling.   The time reported is wall clock time.

pcsamp

High user CPU time.

Actual CPU time at the source line, machine instruction, and function levels by sampling the program counter at 10 or 1-millisecond intervals.

usertime

Slow program, nothing else known. Not CPU-bound.

Inclusive and exclusive CPU time for each function by sampling the callstack at 30-millisecond intervals.


Back To Getting Started With Open|SpeedShop