The frost compilers we focus on here are the cross-compilers IBM provides
in their BlueGene/L software stack for Fortran, C, and C++. They are referred
to as the IBM XL Fortran, C and C++ compilers and they are cross-compilers
because you use them to compile code on Frost's interactive nodes, then run the
resulting executables on the batch nodes. The compilers are not available on
frost's batch nodes.
frost cross-compilers
| Language | Compiler name
|
|---|
| C | blrts_xlc
|
|---|
| C++ | blrts_xlC
|
|---|
| F77 | blrts_xlf
|
|---|
| F90 | blrts_xlf90
|
|---|
| F95 | blrts_xlf95
|
|---|
- The system include files are in /bgl/BlueLight/ppcfloor/bglsys/include
- The mpi and system driver libraries are under /bgl/BlueLight/ppcfloor/bglsys/lib
- The following libraries must be linked as per this order: -lmpich.rts -lmsglayer.rts -lrts.rts -ldevices.rts
- You may also use these scripts in /contrib/bgl/bin to automatically set the include and MPI libraries for you:
- mpxlc
- mpxlC
- mpxlf
- mpxlf90
- The mpicc and mpif77 are available, but they use the GNU gcc compilers, which do not offer the level of
performance of the XL compilers. Please use the XL compilers when compiling your codes.
- OpenMP directives are not available under any compilers on frost.
- TeraGrid users may wish to use the softenv command to provide uniformity on frost with other systems
where they have used this command.
To enable the softenv command and other TeraGrid commands on frost, you must
remove file .nosoft from your home directory on frost, then logout. When you
log back in, you will find the commands in your path.
General information on softenv, compilers, and environment variables common to TeraGrid systems is
available under the Documentation tab on the User Portal located on the
TeraGrid Main Page
Compiler flags
- Optimization levels:
- -O: good place to start, use with -qmaxmem=64000
- -O2: same as -O
- -O3 -qstrict: tries more aggressive optimization, while strictly obeying program semantics
- -O3: aggressive, allows re-association, will replace division by multiplication with the inverse
- -qhot: turns on high-order transformation module will add vector routines, unless -qhot=novector
- check listing: -qreport=hotlist
- Start with -g -O -qarch=440 -qmaxmem=64000
Example
- Here's an example of compiling and linking a simple program:
- bash$ blrts_xlc -c example.c -I/bgl/BlueLight/ppcfloor/bglsys/include
- bash$ blrts_xlc -o example example.o -I/bgl/BlueLight/ppcfloor/bglsys/include -L/bgl/BlueLight/ppcfloor/bglsys/lib -lmpich.rts -lmsglayer.rts -lrts.rts -ldevices.rts
- The same compile using mpxlc from /contrib/bgl/bin:
- bash$ mpxlc -c example.c
- bash$ mpxlc -o example example.o
Note: The compile and link flags are the same for Fortran77/90/95 and C/C++
Further information
To get further information on the IBM XL compilers for BlueGene/L, see these web pages:
Of particular interest on the above XL C/C++/Fortran websites is the following IBM
BlueGene/L document:
- Using the XL Compilers for BlueGene
In browsing it, please see the "Related information" chapter for information
on man pages for the xlf, xlc, and xlC compilers, and further documentation
references.
Note: We do not discuss GNU compilers here, other than to mention that the gcc compiler
is available on frost as executable /usr/local/bin/gcc. It is mainly used to compile
and build products (e.g. gmake) whose toolchain requires it. Otherwise, we recommend
using the IBM XL compilers, for performance reasons.
cobalt is the batch system installed on frost.
It handles partition sizes of 1024, 512, 256, 128, 64, and 32 nodes. The job queuing
commands are located under /usr/bin on the login node; below we provide a brief overview of cobalt
commands cqstat (job status), cqstat (job status), cqsub (job submission), and cqdel (job deletion).
TeraGrid users may wish to logon to the User Portal from the
TeraGrid Home Page
to check the system load (click the Resources tab) of the various computers.
Charging differs, depending on whether usage is on an NCAR account or a TeraGrid
account:
- NCAR users: No formal charging has been established, but per-user accounting statistics are gathered and monitored,
and the NCAR administrators contact users when unusual patterns are seen.
- TeraGrid users can see their per-machine charges via their User Portal logon.
TeraGrid users may check the User Portal's
Batch Queue Prediction Form
to see when their intended job may run on a given system. At this time (October 2007) the User Portal does
not have job-submission capability.
Submitting a job with Cobalt
Use the cqsub command to submit a job to the queue.
cqsub executable
Required flags:
- -n NP, where NP is the number of nodes
- -t TIME, where TIME is how much time your job will take to run, in hours:minutes:seconds format (though the seconds field is ignored.)
Example:
$ cqsub -n 32 -t 00:10:00 example.rts
submitting walltime=10.0 minutes
162
In this example STDOUT is stored in 162.output, and STDERR is stored in 162.error in the current directory.
Some optional flags (see man cqstat for:
- -O OUTPUT_PREFIX, where OUTPUT_PREFIX is the name of the output prefix, which means the output files will be named OUTPUT_PREFIX.output and OUTPUT_PREFIX.error for STDOUT and STDERR, respectively. If OUTPUT_PREFIX is not specified, the output will be placed in .output and .error
- -C CWD, where CWD is the working directory for the code to run in (not neccessarily where the executable resides). The output files and any other files that are opened without specifying a path will be stored in the directory specified by CWD.
- -m MODE, where MODE is co (coprocessor mode) or vn (virtual-node mode)
- -c COUNT, where COUNT is the number of processors to use. By default this is equal to the number of nodes in coprocessor mode, and twice the number of nodes in virtual-node mode. This option is generally used in conjunction with -m vn to specify an odd number of processes in virtual-node mode.
- Example: To specify 55 processes in virtual-node mode:
$ cqsub -n 28 -c 55 -m vn -t 00:10:00 example.rts
- -N email address, sends an email message at the start and stop of the job to the specified email address. Multiple email addresses, separated by colons, can be specified.
Job status with Cobalt
Use the cqstat command to see what jobs are queued or running. WallTime is in hours:minutes:seconds.
$ cqstat
JobID User WallTime Nodes State Location
=====================================================================
18453 lando 07:30:00 256 running 256_R000_J102_N0
18454 hsolo 06:30:00 256 running 256_R000_J203_N8
18455 hsolo 06:30:00 256 running 256_R001_J102_N0
18456 luke 06:30:00 256 running 256_R001_J203_N8
18464 yoda 00:30:00 1024 queued N/A
18465 yoda 00:30:00 1024 queued N/A
18466 luke 00:30:00 900 queued N/A
18467 lando 00:30:00 1024 queued N/A
18468 lando 00:30:00 1024 queued N/A
The -f flag gives more info:
$ cqstat -f
JobID JobName User WallTime RunTime Nodes State Location Mode Procs Queue StartTime
=========================================================================================================================
18453 - lando 07:30:00 02:48:40 256 running 256_R000_J102_N0 vn 512 default 04/20/06 12:08:57
18454 - hsolo 06:30:00 02:45:04 256 running 256_R000_J203_N8 vn 512 default 04/20/06 12:12:32
18455 - hsolo 06:30:00 01:01:25 256 running 256_R001_J102_N0 vn 512 default 04/20/06 13:56:12
18456 - luke 06:30:00 00:56:00 256 running 256_R001_J203_N8 vn 512 default 04/20/06 14:04:34
18464 - yoda 00:30:00 N/A 1024 queued N/A vn 1089 default N/A
18465 - yoda 00:30:00 N/A 1024 queued N/A vn 1296 default N/A
18466 - luke 00:30:00 N/A 900 queued N/A co 900 default N/A
18467 - lando 00:30:00 N/A 1024 queued N/A vn 1936 default N/A
18468 - lando 00:30:00 N/A 1024 queued N/A vn 2025 default N/A
Cancelling a job with Cobalt
Use the cqdel command to cancel a job that has been submitted to the queue.
$ cqdel 162
Deleted Jobs
JobID User
==============
162 joeuser
Note: It may take some time for the job to be deleted if it is running, but you can check the `.error` file to see if the job is being deleted.
System availability commands
Use the partlist command to see what partitions are available and which are in use.
$ partlist
Name Queue State
==================================
NCAR_R00 default busy*
NCAR_R000 default busy*
256_R000_J102_N0 default busy*
128_R000_J102_N0 default busy*
64_R000_J102_N0 default busy
32_R000_J102_N0 default busy*
32_R000_J104_N1 default busy*
64_R000_J106_N2 default idle
32_R000_J106_N2 default idle
32_R000_J108_N3 default idle
...
This example would show up in a case where there was a job running on 64_R000_J102_N0, which makes NCAR_R00, NCAR_R000, 256_R000_J102_N0, etc. busy because they overlap with 64_R000_J102_N0, but 64_R000_J106_N2 is available.
You may also use the nodes helper script which will display only the currently free partitions. nodes -v lists all of the partitions (from the partlist -l output) and indents the output to show the partition hierarchy.
Use the showres -s command to see what system reservations are in place.
When your job isn't running...
If your job is 'queued' and it seems like it should be running, please take the following actions, in the order shown:
- Check the output from partlist or nodes. That will tell you which partitions are idle (available for jobs to run).
- If there are reservations in showres output that start before your job's walltime would end, then your job will not run.
- Check that the queues are running using 'qstat -q'. The queues may be stopped for system maintenance.
- Your job may be limited by queue restrictions. Check the queue restrictions with cqstat -q.
- If your job is listed in the hold state, then it has been held by an administrator. Send email as per the next item in
this bullet list to see why.
- If still stymied, send a message describing the situation to the appropriate one of these addresses:
- NCAR: frost-help@ucar.edu
- TeraGrid: help@teragrid.org
Reservations and JumboFridays:
- Both NCAR and TeraGrid: JumboFridays for half or full rack jobs during the big run window
(usually 8-10am on Fridays.)
- NCAR reservations: please email frost-help@ucar.edu with your request as far in advance as possible.
- TeraGrid reservations: Please use the
TeraGrid Resource Advance Reservation Form
Order of assignment
Example: The BGLMPI_MAPPING variable can control the order in which tasks are assigned to nodes.
- cqsub -e BGLMPI_MAPPING=TXYZ -m vn -n 32 ...
The T-coordinate is for the first or second processor in a node. So in the above example, since it is using virtual-node mode, the processes would be assigned first to the processors on a node, then in the x dimension, then the y dimension, then the z direction.
Process Coordinates
0 <0,0,0,0>
1 <0,0,0,1>
2 <1,0,0,0>
3 <1,0,0,1>
4 <2,0,0,0>
5 <2,0,0,1>
6 <3,0,0,0>
7 <3,0,0,1>
8 <0,1,0,0>
9 <0,1,0,1>
The dimensions for partitions of various sizes are:
32 4,4,2,1
64 8,4,2,1
128 8,4,4,1
256 8,4,8,1
512 8,8,8,1
1024 8,8,16,1
Mapfile
Example: You can also use a mapfile that defines the coordinates of the torus to which each process is assigned:
- cqsub -e BGLMPI_MAPPING=/home/joeuser/sample.map -n 32 ...
The mapfile format is a text file with each line specifying the x,y,z,t coordinates of each process (t is processor 0 or 1 of each node in virtual node mode.) For example:
0 0 0 0
0 0 2 0
0 2 0 0
0 2 2 0
...
So with this map MPI process 0 is placed on the node at 0,0,0,0, process 1 is at 0,0,2,0, etc.
Note that the mapfile must define the coordinates for the full partition where your job is running.
Check Placement
You can check where your processes are being placed, by checking the string returned by the MPI_Get_processor_name. It will look something like this:
- Processor <0,0,0,0> in a <4, 4, 2, 2> mesh
© Copyright 2004-2007. University Corporation for Atmospheric
Research (UCAR). All Rights Reserved.