Last updated: Aug 19th 2009
Performance Analysis and Optimization is a key area of High Performance Computing. On bluefire we have installed several applications for this purpose, including HPMcount, scalasca, TAU, and IPM.
HPMCOUNTUse the AIX command
hpmcountto analyze your code performance.The simple way to use hpmcount is to invoke HPMCOUNT directly:
hpmcount my_executableWhen invoked in this way, hpmcount starts your application, and when execution ends it produces a report summarizing:
- Wall-clock time
- Statistics on resource utilization
- Information from hardware performance counters
- Derived hardware metrics
For more information on how to use hpmcount, see Using HPM Toolkit: A primer for details.
TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python.TAU (Tuning and Analysis Utilities) is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java virtual machine, or manually using the instrumentation API.
More details on TAU Performance System website.
TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools. Paraprof is available on DASG machines (Data Analysis and Visualization cluster), but we recommend to run it on the Visualization Cluster (i.e. storm) because paraprof uses OpenGL.
Running TAU on bluefire
Instrument a code with TAU and running it on bluefire is not different from doing so on other platforms. For user convenience, these are the steps to perform when running on bluefire.
- Have an at least basic understanding of TAU, as described in their documentation
- Copy the example directory from
/contrib/tau-2.18-2p4/examples/to your /ptmp/$LOGNAME directory- The previous directory contains the original TAU examples
- Set the needed environmental variables in your compile and submit scripts, as in the following example. Note that the PATH must always set as below, but you have to choose the relevant TAU_MAKEFILE and TAU_OPTIONS for your particular example or real life code (see TAU documentation for details)
# in ksh or bash export PATH=$PATH:/contrib/tau-2.18-2p4/ibm64/bin:/contrib/papi-3.6.2/bin export TAU_MAKEFILE=/contrib/tau-2.18-2p4/ibm64/lib/Makefile.tau-mpi-pdt export TAU_OPTIONS='-optKeepFiles -optVerbose -optTauSelectFile="select.tau"'# in csh setenv PATH $PATH:/contrib/tau-2.18-2p4/ibm64/bin:/contrib/papi-3.6.2/bin setenv TAU_MAKEFILE /contrib/tau-2.18-2p4/ibm64/lib/Makefile.tau-mpi-pdt setenv TAU_OPTIONS '-optKeepFiles -optVerbose -optTauSelectFile="select.tau"'- Compile the code, instructing the compiler to instrument the executable. This is done using the
tau_cc.shandtau_f90.shscripts (this can be done modifying the CC and F90 environmental variables, or in your makefile). Don't forget to set the above mentioned environmental variables before compiling (if you use a compile script, it's a good idea to set them in the script)- Create a script to submit a job via LSF, making sure that the abovementioned environmental variables are set before the executable is called.
- Submit the job to LSF as usual and wait for the job to run.
- If the run is successful, either profile or trace files will be generated. The profile data files will be stored in a directory specified by the environment variable PROFILEDIR (current directory if such a variable is not set). The trace data files will be stored in a directory specified by the environment variable TRACEDIR (current directory if such a variable is not set). We recommend to use the /ptmp disk for storing these files, because trace files take up a lot of room, especially if there are many MPI tasks, so it is best to do this work under a big directory such /ptmp. In addition, /ptmp is cross-mounted on other systems, so no data transfer is required (see below)
Performance data visualization
There are several options to visualize data collected with TAU. Here we describe only Jumpshot and paraprof.
Tracing using TAU with Jumpshot
If you are using TAU to prepare trace files for viewing with Jumpshot, use
export TAU_MAKEFILE=/contrib/tau-2.18-2p4/ibm64/lib/Makefile.tau-mpi-pdt-traceRun the job, and then create the Jumpshot app.slog2 file using
tau_treemerge.plfollowed by
tau2slog2 tau.trc tau.edf -o app.slog2You can then use the jumpshot program to view the app.slog2 file on either bluefire or one of the DASG machines. See the Jumpshot-4 Users Guide.
Note that creating trace files for many MPI tasks can take up a lot of space, so it is recommended to do this under /ptmp.
paraprof tool
Paraprof is available on DASG machines. To use it:
- login in to storm
- change directory to where the code runs (note that bluefire
/ptmpis mounted under/biptmp, thus no file transfer is required)- run
/contrib/java/tau/paraprofYou might want to watch the how to use paraprof movie on TAU website.
Scalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. It has been specifically designed for use on large-scale systems, but is also well-suited for small- and medium-scale HPC platforms. Scalasca supports an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. A distinctive feature is the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads. Especially when trying to scale communication-intensive applications to large processor counts, such wait states can present severe challenges to achieving good performance.
More details on Scalasca website.
Running Scalasca on bluefire
Instrument a code with scalasca and running it on bluefire is not different from doing so on other platforms. For user convenience, these are the steps to perform when running on bluefire.
- Have an at least basic understanding of scalasca, as described in their documentation
- Copy the example directory from /contrib/scalasca-1.2/example/ to your /ptmp/$LOGNAME directory
- The previous directory contains the original scalasca examples, plus a directory called README.bluefire with additional details on how to use scalasca on bluefire
- Set the needed environmental variables in your compile and submit scripts, as it is done in the README.bluefire scripts.
# in ksh or bash export LD_LIBRARY_PATH=/usr/local/lsf/7.0/aix5-64/lib:/contrib/papi-3.6.2/ export SCALASCA_DIR=/contrib/scalasca-1.2/ export PATH=$PATH:$SCALASCA_DIR/bin:/contrib/papi-3.6.2/bin export SCAN_MPI_LAUNCHER=mpirun.lsf# in csh setenv LD_LIBRARY_PATH /usr/local/lsf/7.0/aix5-64/lib:/contrib/papi-3.6.2/ setenv SCALASCA_DIR /contrib/scalasca-1.2/ setenv PATH $PATH:$SCALASCA_DIR/bin:/contrib/papi-3.6.2/bin setenv SCAN_MPI_LAUNCHER mpirun.lsf- Compile the code, instructing the compiler to instrument the executable. For this example, these instruction for the compiler are already in the Makefile. For your real world program, you should prefix scalasca -instrument to your compiler name in your Makefile, as described in the scalasca documentation. Don't forget to set the environmental variables mentioned above! For this particular example, you can run the compile.sh script, after copying it from the README.bluefire directory to its parent (namely ..)
- Create a script to submit a job via LSF, prefixing mpirun.lsf with scalasca -analyze and using the environmental variables as described above. For this particular example, you can use the submit.lsf, after copying it from the README.bluefire directory to its parent (namely ..)
- Submit the job to LSF as usual and wait for the job to run.
- If the run is successful, a directory called epik_ctest-mpi_32_sum will be created and filled with several scalasca files. The name of the directory will be different if you change anything in the example (or for your real world runs)
Performance data visualization
To visualize scalasca output, the best would be to use CUBE visualization component. Unfortunately CUBE requires qt4.2, which, at this time, is not available on CISL machines. We may consider installing it if enough users will require it.
In the meantime, users can install CUBE on their own machines, it compiles pretty straightforward in all linux systems, and its dependencies are satisfied pretty easy with just a
sudo apt-get install libqt4-devin Debian and Debian-derived distributions such as Ubuntu.IMPORTANT NOTE
Scalasca documentation often usescaninstead ofscalasca -analyze. This is fine if you use the fully qualified path/contrib/scalasca-1.2/bin/scan. Otherwise, you would use another binary, namely/usr/bin/scan, which is in a higher priority directory in thePATHenvironmental variable. If you see errors likescan: unable to change directory to /blhome/ddvento/Mail/inbox: No such file or directoryyou are using the wrong binary. Thus we recommend to not usescanbut instead use alwaysscalasca -analyzeon bluefire.
IPM instructions:
- Add these 3 lines to your batch run script prior to the mpirun.lsf command
setenv MP_EUILIBPATH /contrib/ipm/lib setenv IPM_REPORT full setenv MP_SINGLE_THREAD NOMP_EUILIBPATH points to instrumented MPI library:
- Make the batch run
In the run directory a file is created containing ipm profiling info. The file naming convention is:
-rw-r--r-- 1 mpage ncar 195801 Jul 10 09:13 mpage.1247238769.558263.0- At the command line enter:
ipm_parse -full mpage.1247238769.558263.0This will produce a text-file profile for your review.
- At the command line enter:
ipm_parse -html mpage.1247238769.558263.0This produces a directory hd3D_16_mpage.1247238769.558263.0_ipm_662332
- Tar the directory up and transfer to your desktop
- Untar on the desktop
- Open the file index.html in the directory with a browser