PAPI provides an interface to the hardware counters of CPUs. These counters hint at potential performance issues in running programs.
PAPI's library and companion utilities are available on Apocrita on loading a module.
For users on Apocrita the PAPI libraries would generally be used indirectly, as part of some other profiling or performance tool. However, the library is available for user code and the library interface is documented here.
More useful for users of Apocrita are the companion utilities which provide
information about the available hardware counters. For example, the utility
papi_avail details the counters (with the option
-a only those counters
which may be queried are listed). Counters which may be used will vary by
CPU and, for example, the output of
papi_avail -a on an NXV node looks like:
PAPI Preset Events ================================================================================ Name Code Deriv Description (Note) PAPI_L1_DCM 0x80000000 No Level 1 data cache misses PAPI_L1_ICM 0x80000001 No Level 1 instruction cache misses PAPI_L2_DCM 0x80000002 Yes Level 2 data cache misses PAPI_L2_ICM 0x80000003 No Level 2 instruction cache misses PAPI_L1_TCM 0x80000006 Yes Level 1 cache misses PAPI_L2_TCM 0x80000007 No Level 2 cache misses PAPI_L3_TCM 0x80000008 No Level 3 cache misses PAPI_CA_SNP 0x80000009 No Requests for a snoop PAPI_CA_SHR 0x8000000a No Requests for exclusive access to shared cache line PAPI_CA_CLN 0x8000000b No Requests for exclusive access to clean cache line PAPI_CA_INV 0x8000000c No Requests for cache line invalidation PAPI_CA_ITV 0x8000000d No Requests for cache line intervention PAPI_L3_LDM 0x8000000e No Level 3 load misses PAPI_TLB_DM 0x80000014 Yes Data translation lookaside buffer misses PAPI_TLB_IM 0x80000015 No Instruction translation lookaside buffer misses PAPI_L1_LDM 0x80000017 No Level 1 load misses PAPI_L1_STM 0x80000018 No Level 1 store misses PAPI_L2_LDM 0x80000019 No Level 2 load misses PAPI_L2_STM 0x8000001a No Level 2 store misses ...
Here the first column gives the name of the counter and the final column a brief description of the counter. This name will be accepted by the PAPI library to refer to the counter and will generally be used by third-party tools.
Hardware counters presented may be either measured directly or derived. Derived results (indicated by the third column in the output above) are calculated considering a number of directly measured counters.
Multiple counters may be recorded during a single profiling run of a program.
However, it is not possible to watch all counters at once and some combinations
of counters are not observable. The utility
papi_command_line can be used
to see whether a particular combination of counters may be selected. For
example, on an NXV node the counters
PAPI_L1_TCM are compatible:
$ papi_command_line PAPI_L1_DCM PAPI_L3_DCR PAPI_L1_TCM This utility lets you add events from the command line interface to see if they work. Successfully added: PAPI_L1_DCM Successfully added: PAPI_L3_DCR Successfully added: PAPI_L1_TCM PAPI_L1_DCM : 2329 PAPI_L3_DCR : 52 PAPI_L1_TCM : 2486 ----------------------------------
whereas the counters
PAPI_DP_OPS are not compatible:
$ papi_command_line PAPI_L1_DCM PAPI_L3_DCR PAPI_L1_TCM PAPI_SP_OPS PAPI_DP_OPS This utility lets you add events from the command line interface to see if they work. Successfully added: PAPI_L1_DCM Successfully added: PAPI_L3_DCR Successfully added: PAPI_L1_TCM Successfully added: PAPI_SP_OPS Failed adding: PAPI_DP_OPS because: Invalid argument PAPI_L1_DCM : 2369 PAPI_L3_DCR : 51 PAPI_L1_TCM : 2549 PAPI_SP_OPS : 0 ----------------------------------
Incompatible combinations of counters may be resolved by separating collection
between multiple profiling runs of the program (again using
papi_event_chooser to assist).