Using Hardware Performance Monitor (HPM) Toolkit:
A primer
last update:
01/15/2009
HPMCOUNT is an HPM Toolkit executable that can instrument your entire code under its control and report resource statistics for the entire code execution. You invoke HPMCOUNT on your unmodified executable (no source code changes, recompilation, or relinking required) to measure several types of hardware performance events for your application as a whole.
The simple default method for invoking HPMCOUNT is:
hpmcount your_executableHPMCOUNT has options to control what is monitored and how to direct the output report.
HPMCOUNT was developed by IBM to measure the performance of applications running on IBM POWER systems, including POWER6. HPMCOUNT starts your application, and when execution ends it produces a report that summarizes:
- Wall-clock time
- Statistics on resource utilization
- Information from hardware performance counters
- Derived hardware metrics
Here is a simple runscript that provides an example of how to use HPMCOUNT and how its output appears.
Runscript (with here-script of program):
#! /bin/csh # Very simple serial code set up to execute under HPMCOUNT control. cat << 'EOF' > ./it.f program main implicit none integer i real sum common sum sum=0.0 do i=1,1000000 sum=sum+exp(.00000001*i) end do print*,'sum=',sum stop end 'EOF' # Compile and build program "it" from it.f, use -g option and no # optimization to support source debugging of all Fortan statements: xlf_r -O4 -qarch=auto -qrealsize=8 -o it it.f # Execute program "it" with HPMCOUNT: /usr/bin/hpmcount ./it # Clean up working directory: rm it* ~ ~HPMCOUNT output:Execution time (wall clock time): 0.057595 seconds ######## Resource Usage Statistics ######## Total amount of time in user mode : 0.015934 seconds Total amount of time in system mode : 0.003379 seconds Maximum resident set size : 8532 Kbytes Average shared memory use in text segment : 0 Kbytes*sec Average unshared memory use in data segment : 77 Kbytes*sec Number of page faults without I/O activity : 2073 Number of page faults with I/O activity : 2 Number of times process was swapped out : 0 Number of times file system performed INPUT : 0 Number of times file system performed OUTPUT : 0 Number of IPC messages sent : 0 Number of IPC messages received : 0 Number of signals delivered : 0 Number of voluntary context switches : 13 Number of involuntary context switches : 3 ####### End of Resource Statistics ######## Set: 1 Counting duration: 0.019886103 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 4000225 PM_FPU_FMA (FPU executed multiply-add instruction) : 11000076 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 26428653 PM_RUN_INST_CMPL (Run instructions completed) : 47657875 PM_RUN_CYC (Run cycles) : 93529315 Utilization rate : 9.755 % Flop : 26.000 Mflop Flop rate (flops / WCT) : 451.435 Mflop/s Flops / user time : 4627.772 Mflop/s FMA percentage : 146.665 %
Next page | Table of contents - HPM Toolkit primer
If you have questions about this document, please contact us via any of the methods shown on this web page: CISL Customer Support.
© Copyright 2003-2009. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.
Address of this page: http://www.cisl.ucar.edu/docs/ibm/hpm.toolkit/hpmcount.html