Using Hardware Performance Monitor (HPM) Toolkit:
A primer
last update:
01/07/2009
This example uses the same exponential sum loop as the simple serial example, but the exponential sum is performed with four OpenMP threads. By examining the Fortran code below, you can see that each thread gets its thread number in its own copy of variable thdID, and the program calls libhpm subroutines F_HPMTSTART and F_HPMTSTOP instead of F_HPMSTART and F_HPMSTOP as in the simple serial example.
The output of this run contains four sections instead of the single section of the simple serial code. The four sections are numbered to correspond with the four partial-sum thread numbers.
Script for compiling source code, creating batch job, and running:
#! /bin/csh # Part1: put the OpenMP Fortran source code in file it.F cat << 'EOF1' > it.F program main implicit none #include "/usr/include/f_hpm.h" integer thdID, omp_get_thread_num integer i real sum ! Initialize hpmtoolkit: call f_hpminit(100,"main") sum=0.0 !$omp parallel private (thdID) ! Start instrumentation around compute loop: thdID = 1+omp_get_thread_num() call f_hpmtstart(thdID, "Partial Sum EXP") !$omp do reduction(+:sum) do i=1,10000000 sum=sum+exp(.00000001*i) end do ! Stop instrumentation after compute loop: call f_hpmtstop(thdID) !$omp end parallel ! Generate hardware analysis output file: call f_hpmterminate(100) stop end 'EOF1' # Part2: compile the it.F source code with hpm and pmapi libraries xlf95_r -I/usr/include -qarch=auto -qsmp=omp -O3 -qstrict -oit it.F \ -L/usr/lib -lhpm_r -lpmapi -lm # Part3: create the batch job lsf.ompjob cat << 'EOF2' > lsf.ompjob #!/bin/csh # # LSF script to run an OMP code # #BSUB -x # exclusive use of node #BSUB -n 1 # placeholder; see OMP_NUM_THREADS below #BSUB -R "span[ptile=1]" # run 1 tasks per host #BSUB -o omplsf.%J.out # output filename #BSUB -e omplsf.%J.err # input filename #BSUB -J omplsf.test # job name #BSUB -P xxxxxxxx # your valid 8-digit project number #BSUB -W 0:10 # hh:mm wall clock time #BSUB -q regular # queue setenv OMP_NUM_THREADS 4 mpirun.lsf ./it exit 'EOF2' # Part4: submit lsf.ompjob to the batch queue specified by #BSUB -q bsub < lsf.ompjob # Part5: cleanup rm -f it.F it lsf.ompjob exit
Output on bluefire POWER6:
Total execution time of instrumented code (wall time): 0.152229 seconds ######## Resource Usage Statistics ######## Total amount of time in user mode : 0.599284 seconds Total amount of time in system mode : 0.006747 seconds Maximum resident set size : 2340 Kbytes Average shared memory use in text segment : 10 Kbytes*sec Average unshared memory use in data segment : 1528 Kbytes*sec Number of page faults without I/O activity : 571 Number of page faults with I/O activity : 26 Number of times process was swapped out : 0 Number of times file system performed INPUT : 0 Number of times file system performed OUTPUT : 0 Number of IPC messages sent : 0 Number of IPC messages received : 0 Number of signals delivered : 0 Number of voluntary context switches : 25 Number of involuntary context switches : 3 ####### End of Resource Statistics ######## Instrumented section: 1 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151727 seconds Total time in user mode: 0.151429463435374 seconds Set: 1 Counting duration: 0.151624695 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000005 PM_FPU_FMA (FPU executed multiply-add instruction) : 27500000 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 712324196 PM_RUN_INST_CMPL (Run instructions completed) : 195545509 PM_RUN_CYC (Run cycles) : 713189616 Utilization rate : 99.804 % Flop : 85.000 Mflop Flop rate (flops / WCT) : 560.217 Mflop/s Flops / user time : 561.317 Mflop/s FMA percentage : 95.652 % Instrumented section: 2 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151707 seconds Total time in user mode: 0.151499265943878 seconds Set: 1 Counting duration: 0.151595207 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000001 be1105en.ucar.edu:/blhome/valent/hpmtoolkit/omp>cat main_311962_0100.hpm Total execution time of instrumented code (wall time): 0.152229 seconds ######## Resource Usage Statistics ######## Total amount of time in user mode : 0.599284 seconds Total amount of time in system mode : 0.006747 seconds Maximum resident set size : 2340 Kbytes Average shared memory use in text segment : 10 Kbytes*sec Average unshared memory use in data segment : 1528 Kbytes*sec Number of page faults without I/O activity : 571 Number of page faults with I/O activity : 26 Number of times process was swapped out : 0 Number of times file system performed INPUT : 0 Number of times file system performed OUTPUT : 0 Number of IPC messages sent : 0 Number of IPC messages received : 0 Number of signals delivered : 0 Number of voluntary context switches : 25 Number of involuntary context switches : 3 ####### End of Resource Statistics ######## Instrumented section: 1 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151727 seconds Total time in user mode: 0.151429463435374 seconds Set: 1 Counting duration: 0.151624695 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000005 PM_FPU_FMA (FPU executed multiply-add instruction) : 27500000 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 712324196 PM_RUN_INST_CMPL (Run instructions completed) : 195545509 PM_RUN_CYC (Run cycles) : 713189616 Utilization rate : 99.804 % Flop : 85.000 Mflop Flop rate (flops / WCT) : 560.217 Mflop/s Flops / user time : 561.317 Mflop/s FMA percentage : 95.652 % Instrumented section: 2 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151707 seconds Total time in user mode: 0.151499265943878 seconds Set: 1 Counting duration: 0.151595207 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000001 PM_FPU_FMA (FPU executed multiply-add instruction) : 27500000 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 712652547 PM_RUN_INST_CMPL (Run instructions completed) : 196413574 PM_RUN_CYC (Run cycles) : 713067052 Utilization rate : 99.863 % Flop : 85.000 Mflop Flop rate (flops / WCT) : 560.291 Mflop/s Flops / user time : 561.059 Mflop/s FMA percentage : 95.652 % Instrumented section: 3 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151684 seconds Total time in user mode: 0.151189739158163 seconds Set: 1 Counting duration: 0.151348445 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000001 PM_FPU_FMA (FPU executed multiply-add instruction) : 27500000 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 711196533 PM_RUN_INST_CMPL (Run instructions completed) : 195208782 PM_RUN_CYC (Run cycles) : 711893378 Utilization rate : 99.674 % Flop : 85.000 Mflop Flop rate (flops / WCT) : 560.376 Mflop/s Flops / user time : 562.207 Mflop/s FMA percentage : 95.652 % Instrumented section: 4 - Label: Partial Sum EXP - process: 100 file: it.F, lines: 13 <--> 19 Count: 1 Wall Clock Time: 0.151675 seconds Total time in user mode: 0.151419299319728 seconds Set: 1 Counting duration: 0.151589205 seconds PM_FPU_1FLOP (FPU executed one flop instruction ) : 30000001 PM_FPU_FMA (FPU executed multiply-add instruction) : 27500000 PM_FPU_FSQRT_FDIV (FPU executed FSQRT or FDIV instruction) : 0 PM_CYC (Processor cycles) : 712276384 PM_RUN_INST_CMPL (Run instructions completed) : 196614451 PM_RUN_CYC (Run cycles) : 713002604 Utilization rate : 99.831 % Flop : 85.000 Mflop Flop rate (flops / WCT) : 560.409 Mflop/s Flops / user time : 561.355 Mflop/s FMA percentage : 95.652 %
Next page | Table of contents - HPM Toolkit primer
If you have questions about this document, please contact us via any of the methods shown on this page: CISL Customer Support.
© Copyright 2003-2009. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.
Address of this page: http://www.cisl.ucar.edu/docs/ibm/hpm.toolkit/ex.omp.html