Using Hardware Performance Monitor (HPM) Toolkit:
A primer
last update:
01/15/2009
The hpmcount command provides performance information for your entire program. However, if you want to check performance of specific sections of the code, rather than the entirety, you may instrument just those sections by inserting calls to the HPM Toolkit library, referred to as LIBHPM.
Code instrumented with LIBHPM subroutine calls collects performance information during runtime and summarizes it for a final report. You may instrument as many sections of your code as you like. (Caveat: placing calls within inner loops may incur excessive overhead during execution.) MPI and OpenMP applications are supported by LIBHPM. We recommend you always use the thread-safe version of the library libhpm_r. It is a 64-bit API.
LIBHPM uses the same set of hardware event counters used by hpmcount. You can select the event set to be used with the environment variable HPM_EVENT_SET.
This section provides brief descriptions of six Fortran-callable subroutines in LIBHPM that are useful in providing performance information for your code.
- call F_HPMINIT(TASKID, PROGRAM_NAME)
- This subroutine call inititalizes the performance session. TASKID is an integer value that specifies the process ID. This is an arbitrary number; it is useful with a code that is either multi-threaded or multitasked with MPI. Any integer can be used for a serial code. PROGRAM_NAME is an arbitrary string used to identify your output; it is common practice to use your program name as this identifier.
- call F_HPMTERMINATE(TASKID)
- This subroutine routine terminates the performance session and generates the summary output. If the program exits without calling F_HPMTERMINATE, no performance information will be reported.
- call F_HPMSTART(INSTRU_ID,LABEL) and
call F_HPMSTOP(INSTRU_ID)- These calls occur in pairs. They delineate the sections of code being instrumented. F_HPMSTART starts summing the requested information, and F_HPMSTOP stops the summation. You can use pairs of these calls throughout your code, and the report generated in your output will be identified by LABEL. By default, INSTRU_ID is a user-chosen integer between 1 and 100; it is used by the library to internally identify the summation data.
- call F_HPMTSTART(INSTRU_ID,LABEL) and
call F_HPMTSTOP(INSTRU_ID)- Similar to the above pair of start-stop routines, but for instrumenting threads, as the added letter T in the routine names suggests. You place calls to F_HPMTSTART and F_HPMTSTOP to instrument the individual threads in a parallel region, by setting INSTRU_ID=omp_get_thread_num. In nonparallel regions, do not use thread number values for INSTRU_ID.
Using environment variable HPM_EVENT_SET on POWER6 systems:
The HPM_EVENT_SET environment variable specifies the performance events to be measured and summed for the report. On POWER6 systems, HPM_EVENT_SET can be set to any integer value between 0 and 12. Its default value is 1. In the C-shell, you set this variable with command "setenv HPM_EVENT_SET n", where n may be any value between 0 and 12 (in other shells, use the export command). Setting HPM_EVENT_SET to 0 will show performance for all 12 event sets. You may list the twelve sets by executing command pmlist -S -1 on the IBM Power6 computer. We show the first three sets below.Set #1 Mapped Group #186: pm_hpm1 Mapped Group name: HPM group Mapped Group description: HPM group Mapped Group status: Verified Mapped Group members: Counter 1, event 101: PM_FPU_1FLOP : FPU executed one flop instruction Counter 2, event 111: PM_FPU_FMA : FPU executed multiply-add instruction Counter 3, event 102: PM_FPU_FSQRT_FDIV : FPU executed FSQRT or FDIV instruction Counter 4, event 12: PM_CYC [shared chip] : Processor cycles Counter 5, event 0: PM_RUN_INST_CMPL : Run instructions completed Counter 6, event 0: PM_RUN_CYC : Run cycles Set #2 Mapped Group #187: pm_hpm2 Mapped Group name: HPM group Mapped Group description: HPM group Mapped Group status: Verified Mapped Group members: Counter 1, event 139: PM_INST_CMPL : Instructions completed Counter 2, event 246: PM_LSU_LDF : LSU executed Floating Point load instruction Counter 3, event 113: PM_FPU_STF : FPU executed store instruction Counter 4, event 12: PM_CYC [shared chip] : Processor cycles Counter 5, event 0: PM_RUN_INST_CMPL : Run instructions completed Counter 6, event 0: PM_RUN_CYC : Run cycles Set #3 Mapped Group #188: pm_hpm3 Mapped Group name: HPM group Mapped Group description: HPM group Mapped Group status: Verified Mapped Group members: Counter 1, event 12: PM_CYC [shared chip] : Processor cycles Counter 2, event 210: PM_LD_MISS_L1 : L1 D cache load misses Counter 3, event 299: PM_ST_MISS_L1 : L1 D cache store misses Counter 4, event 145: PM_INST_CMPL : Instructions completed Counter 5, event 0: PM_RUN_INST_CMPL : Run instructions completed Counter 6, event 0: PM_RUN_CYC : Run cyclesWhat's next
This following sections of this document show examples for instrumenting code, compiling with required libraries, running the job, and getting performance output.
Next page | Table of contents - HPM Toolkit primer
If you have questions about this document, please contact us by any of the methods shown on this page: CISL Customer Support.
© Copyright 2003-2009. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.
Address of this page: http://www.cisl.ucar.edu/docs/ibm/hpm.toolkit/library.html