This document is under construction!
Last updated: 8/22/08
OverviewBluefire is an IBM clustered Symmetric MultiProcessing (SMP) system based on the Power6 chip. Its taxonomy is similar to that of blueice, NCAR's former Power5+ system. The software stack is nearly identical on the two systems. CISL completed acceptance testing and replaced blueice by bluefire on July 1, 2008.
For the NCAR and University user community, its purpose is to provide high-performance supercomputing for numerically intensive models and applications. It is best suited to running codes that can be run efficiently in parallel.
The programming models are the familiar ones: OpenMP, Message Passing Interface (MPI), and hybrid jobs that use both OpenMP and MPI.
See Appendix A. for a comparison of blueice and bluefire processors.
Bluefire software is identical with that of blueice, although versions may differ, and there may be some difference in product names due to the difference in switches.
The Community Computing users and Climate Simulation Lab users share the system. The number of nodes available to each group is flexible, to maximize the system's productivity - meaning, if one group is not using all of its available compute nodes, then members of the other group may use those nodes. LSF determines these splits based on runtime usage. Platform Computing Inc provides the LSF batch system; the manuals are available for browsing at the Platform Knowledge Centre.
Users active on blueice in the past 6 months already have a login. Other users with valid projects may request a bluefire login by sending e-mail to consult1@ucar.edu or contacting CISL Customer Support. Please include the following information with your login request:
To log in, type
(Mac users may need to use the -Y option with ssh.) You will be prompted for your one-time password; use your Cryptocard to obtain it. Note: The roy gateway computer is not used for accessing bluefire.
Bluefire node names use the prefix "be." However, because your home
directory has been transferred over from blueice, your home directory
path still shows the "bl" prefix, for example: /blhome/
You may use scp (secure copy) or sftp (secure FTP) to transfer files between bluefire and remote platforms. The transfer may be initiated in either of two ways:
The former security model used for the supers at NCAR permitted you to install public keys from the super to the remote platform, and this allowed you to initiate file transfers from the super that did not require you to type in your passphrase. This method is still possible for transfers initiated from bluefire. See "How to install ssh keys on remote systems."
Batch job file transfer: This setup allows you to transfer files from batch jobs running on bluefire as well. However, batch file transfer jobs (including to the Mass Storage System) must be submitted to the share queue, because scp/sftp are not available on nodes in the other batch queues.
Once the keys are in place you may initiate a file transfer from a remote machine by executing:
Note:
Bluefire queues have similar names to the previous blueice queues, although there are equivalent queues for the regular memory (64 GB) and large memory (128 GB) nodes. There are 69 regular memory nodes and 48 large memory nodes on the full system. The queue structure is described here:
| Queue Name | Queue Charging Factor | Run Limit |
|---|---|---|
| capability (by special permission only) | 1 | 12 hours |
| debug | 1 | 6 hours |
| dedicated | 1 | 6 hours |
| economy | 0.5 | 6 hours |
| hold | 0.33 | 6 hours |
| lrg_capability | 1 | 12 hours |
| lrg_economy | 0.5 | 6 hours |
| lrg_hold | 0.33 | 6 hours |
| lrg_premium | 1.5 | 6 hours |
| lrg_regular | 1 | 6 hours |
| lrg_standby | 0.1 | 6 hours |
| premium | 1.5 | 6 hours |
| regular | 1 | 6 hours |
| share | 1 | 12 hours |
| special | 1 | 6 hours |
| standby | 0.1 | 6 hours |
Charging began immediately for bluefire usage as soon as it became
available. The following formula specifies how your computing account
is charged for running jobs on bluefire:
GAUs charged = wallclock hours used * number of nodes used * number of processors in that node * computer factor * queue charging factor
The "computer factor" is a multiplier that equalizes the way GAUs are consumed on different computing platforms. Faster computers have higher computer factors. The computer charging factor for bluefire is 1.4.
The "queue charging factor" is a multiplier that reflects the priority given to jobs in a queue: higher-priority jobs are charged more.
If you were an active user of blueice, your account has been copied over to bluefire. You will not need to copy over your files from blueice to bluefire.
Continuing users will have the same shell and environment on bluefire as on blueice. If you are a new user, you will be given Korn shell as the default. If you need to change your shell, you may do so by logging in to bluefire and then rsh'ing to bems. Follow the prompts to change your shell. The change may take up to 60 minutes to propagate.
At present, quotas are the same as they were for blueice. They may be increased in the future.
Bluefire has 32 cpus/cores per node. Since the nodes have Simultaneous Multi-Threading (SMT) enabled, it will appear that there are 64 virtual cpus in each node. Your program may benefit from use of SMT. To use SMT, as in bluevista or blueice, you may just need to spawn more processes or threads than the number of physical cpus available. Typically 64 tasks/threads per node is most efficient; however, we recommend that you compare performance for your application for 32 and 64 tasks per node.
The default page size is 4 KB. On POWER6 systems, AIX 5L Version 5.3 supports a new 64-KB page size when running the 64-bit kernel. 64-KB pages are intended to be general-purpose. They are easy to use, and it is expected that many applications will see performance benefits when using 64-KB pages rather than 4-KB pages. IBM has reported performance improvements on a variety of workloads ranging from 1% to 13% when compared to the default 4-KB pages.
A user can specify a different page size to use for each of the three regions of a process's address space (data, stack, and text). The ldedit command may be used to set these page size options in an existing executable:
ldedit -btextpsize=64K -bdatapsize=64K -bstackpsize=64K a.out
A user can also set a process's preferred page sizes via the LDR_CNTRL environment variable. The following example will cause a.out to use 4-KB pages for its data, 64-KB pages for its text, and 64-KB pages for its stack:
Korn shell:
export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K
This will override any page size settings in an executable's XCOFF header.
Caveat: Using 64-KB pages rather than 4 KB pages for a multithreaded process's data may reduce the maximum number of threads a process can create due to alignment requirements for stack guard pages. If you encounter this limit, you may disable stack guard pages by setting the environment variable AIXTHREAD_GUARDPAGES to 0.
AIX 5.3 also supports large pages (16 MB) and "huge" pages (16 GB). However, these must be configured by the system administrator and the system rebooted. Users must be specifically authorized to use large pages. For further information on special requests for use of large pages, contact the CISL Consulting Office by any of the methods in our CISL Customer Support.
Further details are discussed in the IBM Whitepaper, "Guide to Multiple Page Size Support on AIX 5L Version 5.3".
We highly recommend using processor binding for all parallel jobs. If you have been using the bindproc.x script in /contrib on blueice, this has been replaced with an IBM-provided launch script on bluefire. To use it, set (Korn or Bourne shell syntax):
You may provide a comma (,) separated list of cpu-ids.
For hybrid programs, use:
along with your OMP_NUM_THREADS environment variable setting.
Important: All parallel jobs should begin using one of the launch scripts mentioned above with their mpirun.lsf command.
We are working to provide new examples; see directory /usr/local/examples on the system.
Contact CISL Customer Support, call 303-497-1278, or send email to consult1@ucar.edu.
| Resource | Bluefire/Power6 | Blueice/Power5+ |
|---|---|---|
| Clock cycle | 4.7GHz | 1.9GHz |
| Memory/processor | 2-4 GB | 2-4 GB |
| L1 cache | L1 cache is 128 KB (64 KB data + 64 KB instruction) per processor. | L1 cache is 96 KB (32 KB data + 64 KB instruction) per processor. |
| L2 cache | L2 cache is 4 MB per processor on-chip. | L2 cache is 2 MB per processor on-chip. |
| L3 cache | The off-chip L3 cache is 32 MB per two-processor chip, and is shared by the two processors on the chip. L3 cache memory is connected to the chip via an 80-GB-per-second bus. | 36 MB per processor pair, shared by all processors |
| Switch Latency | Infiniband 1.3 µs (peak) | HPS 5.0 µs (peak) |
| Switch Bandwidth | 20 GBps each direction | 1.7 GBps each direction |
| Multiple Functional Units | Main thing is faster clock; 2 floating point units; 3 fixed point units; two load/store units | 2 floating point units; 3 fixed point units; two load/store units |
| Simultaneous Multi-Threading (SMT) | Yes - SMT appears to the OS as multiple CPUs. Threaded applications may take advantage of SMT. To use on bluefire, use double the number of tasks you used on blueice | Same. |