PAPI¶
PAPI provides an interface to the hardware counters of CPUs. These counters hint at potential performance issues in running programs.
PAPI's library and companion utilities are available on Apocrita on loading a module.
Using PAPI¶
For users on Apocrita the PAPI libraries would generally be used indirectly, as part of some other profiling or performance tool. However, the library is available for user code and the library interface is documented here.
More useful for users of Apocrita are the companion utilities which provide
information about the available hardware counters. For example, the utility
papi_avail
details the counters (with the option -a
only those counters
which may be queried are listed). Counters which may be used will vary by
CPU and, for example, the output of papi_avail -a
on an NXV node looks like:
PAPI Preset Events
================================================================================
Name Code Deriv Description (Note)
PAPI_L1_DCM 0x80000000 No Level 1 data cache misses
PAPI_L1_ICM 0x80000001 No Level 1 instruction cache misses
PAPI_L2_DCM 0x80000002 Yes Level 2 data cache misses
PAPI_L2_ICM 0x80000003 No Level 2 instruction cache misses
PAPI_L1_TCM 0x80000006 Yes Level 1 cache misses
PAPI_L2_TCM 0x80000007 No Level 2 cache misses
PAPI_L3_TCM 0x80000008 No Level 3 cache misses
PAPI_CA_SNP 0x80000009 No Requests for a snoop
PAPI_CA_SHR 0x8000000a No Requests for exclusive access to shared cache line
PAPI_CA_CLN 0x8000000b No Requests for exclusive access to clean cache line
PAPI_CA_INV 0x8000000c No Requests for cache line invalidation
PAPI_CA_ITV 0x8000000d No Requests for cache line intervention
PAPI_L3_LDM 0x8000000e No Level 3 load misses
PAPI_TLB_DM 0x80000014 Yes Data translation lookaside buffer misses
PAPI_TLB_IM 0x80000015 No Instruction translation lookaside buffer misses
PAPI_L1_LDM 0x80000017 No Level 1 load misses
PAPI_L1_STM 0x80000018 No Level 1 store misses
PAPI_L2_LDM 0x80000019 No Level 2 load misses
PAPI_L2_STM 0x8000001a No Level 2 store misses
...
Here the first column gives the name of the counter and the final column a brief description of the counter. This name will be accepted by the PAPI library to refer to the counter and will generally be used by third-party tools.
Hardware counters presented may be either measured directly or derived. Derived results (indicated by the third column in the output above) are calculated considering a number of directly measured counters.
Multiple counters may be recorded during a single profiling run of a program.
However, it is not possible to watch all counters at once and some combinations
of counters are not observable. The utility papi_command_line
can be used
to see whether a particular combination of counters may be selected. For
example, on an NXV node the counters PAPI_L1_DCM
, PAPI_L3_DCR
and
PAPI_L1_TCM
are compatible:
$ papi_command_line PAPI_L1_DCM PAPI_L3_DCR PAPI_L1_TCM
This utility lets you add events from the command line interface to see if they work.
Successfully added: PAPI_L1_DCM
Successfully added: PAPI_L3_DCR
Successfully added: PAPI_L1_TCM
PAPI_L1_DCM : 2329
PAPI_L3_DCR : 52
PAPI_L1_TCM : 2486
----------------------------------
whereas the counters PAPI_L1_DCM
, PAPI_L3_DCR
, PAPI_L1_TCM
, PAPI_SP_OPS
and PAPI_DP_OPS
are not compatible:
$ papi_command_line PAPI_L1_DCM PAPI_L3_DCR PAPI_L1_TCM PAPI_SP_OPS PAPI_DP_OPS
This utility lets you add events from the command line interface to see if they work.
Successfully added: PAPI_L1_DCM
Successfully added: PAPI_L3_DCR
Successfully added: PAPI_L1_TCM
Successfully added: PAPI_SP_OPS
Failed adding: PAPI_DP_OPS
because: Invalid argument
PAPI_L1_DCM : 2369
PAPI_L3_DCR : 51
PAPI_L1_TCM : 2549
PAPI_SP_OPS : 0
----------------------------------
Incompatible combinations of counters may be resolved by separating collection
between multiple profiling runs of the program (again using papi_command_line
or papi_event_chooser
to assist).