Bluefire Quick Start Guide

This document is under construction!

Last updated: 8/22/08

Overview
Recommended use
Hardware
Software stack
Job scheduling
How to get an account
Logging in
File transfer
Queues and charging
Shells and environment
Running jobs
Multiple page size support
Processor binding is mandatory
Examples
Getting help
Appendix A. Comparison of bluefire and blueice chips

Overview

Bluefire is an IBM clustered Symmetric MultiProcessing (SMP) system based on the Power6 chip. Its taxonomy is similar to that of blueice, NCAR's former Power5+ system. The software stack is nearly identical on the two systems. CISL completed acceptance testing and replaced blueice by bluefire on July 1, 2008.

Recommended use

For the NCAR and University user community, its purpose is to provide high-performance supercomputing for numerically intensive models and applications. It is best suited to running codes that can be run efficiently in parallel.

The programming models are the familiar ones: OpenMP, Message Passing Interface (MPI), and hybrid jobs that use both OpenMP and MPI.

Hardware (full system)

See Appendix A. for a comparison of blueice and bluefire processors.

Software stack

Bluefire software is identical with that of blueice, although versions may differ, and there may be some difference in product names due to the difference in switches.

Job scheduling

The Community Computing users and Climate Simulation Lab users share the system. The number of nodes available to each group is flexible, to maximize the system's productivity - meaning, if one group is not using all of its available compute nodes, then members of the other group may use those nodes. LSF determines these splits based on runtime usage. Platform Computing Inc provides the LSF batch system; the manuals are available for browsing at the Platform Knowledge Centre.

How to get an account

Users active on blueice in the past 6 months already have a login. Other users with valid projects may request a bluefire login by sending e-mail to consult1@ucar.edu or contacting CISL Customer Support. Please include the following information with your login request:

Logging in

To log in, type

(Mac users may need to use the -Y option with ssh.) You will be prompted for your one-time password; use your Cryptocard to obtain it. Note: The roy gateway computer is not used for accessing bluefire.

Bluefire node names use the prefix "be." However, because your home directory has been transferred over from blueice, your home directory path still shows the "bl" prefix, for example: /blhome/.

File transfer

You may use scp (secure copy) or sftp (secure FTP) to transfer files between bluefire and remote platforms. The transfer may be initiated in either of two ways:

  1. initiated from bluefire, if the target machine accepts incoming ssh sessions from bluefire.

    The former security model used for the supers at NCAR permitted you to install public keys from the super to the remote platform, and this allowed you to initiate file transfers from the super that did not require you to type in your passphrase. This method is still possible for transfers initiated from bluefire. See "How to install ssh keys on remote systems."

    Batch job file transfer: This setup allows you to transfer files from batch jobs running on bluefire as well. However, batch file transfer jobs (including to the Mass Storage System) must be submitted to the share queue, because scp/sftp are not available on nodes in the other batch queues.

  2. initiated from remote machines, this method requires you to install the public key of remote machine into bluefire. Please follow these steps to install the keys:

    1. Go to the .ssh directory under your home directory on your workstation or the remote machine from which you plan to initiate a file transfer for bluefire. Use "cat" or "more" to display the contents of your id_rsa.pub file, for example:
        % more id_rsa.pub
    2. In another window, login to bluefire using your CryptoCard.
    3. Execute the utility to store keys:

        % /usr/local/bin/bluefire_scp_setup
    4. When prompted, use your mouse to copy your public keys from your workstation window and paste them into the bluefire window. You may need to press <return>.

    Note: Please allow up to 60 minutes for the keys to be updated before attempting key-based file transfer.

    Once the keys are in place you may initiate a file transfer from a remote machine by executing:

    or

    Note:

    Queues and charging

    Bluefire queues have similar names to the previous blueice queues, although there are equivalent queues for the regular memory (64 GB) and large memory (128 GB) nodes. There are 69 regular memory nodes and 48 large memory nodes on the full system. The queue structure is described here:

    Queue Name Queue Charging Factor Run Limit
    capability (by special permission only) 1 12 hours
    debug 1 6 hours
    dedicated 1 6 hours
    economy 0.5 6 hours
    hold 0.33 6 hours
    lrg_capability 1 12 hours
    lrg_economy 0.5 6 hours
    lrg_hold 0.33 6 hours
    lrg_premium 1.5 6 hours
    lrg_regular 1 6 hours
    lrg_standby 0.1 6 hours
    premium 1.5 6 hours
    regular 1 6 hours
    share 1 12 hours
    special 1 6 hours
    standby 0.1 6 hours

    Charging began immediately for bluefire usage as soon as it became available. The following formula specifies how your computing account is charged for running jobs on bluefire:
    GAUs charged = wallclock hours used * number of nodes used * number of processors in that node * computer factor * queue charging factor

    The "computer factor" is a multiplier that equalizes the way GAUs are consumed on different computing platforms. Faster computers have higher computer factors. The computer charging factor for bluefire is 1.4.

    The "queue charging factor" is a multiplier that reflects the priority given to jobs in a queue: higher-priority jobs are charged more.

    If you were an active user of blueice, your account has been copied over to bluefire. You will not need to copy over your files from blueice to bluefire.

    Shells and environment

    Continuing users will have the same shell and environment on bluefire as on blueice. If you are a new user, you will be given Korn shell as the default. If you need to change your shell, you may do so by logging in to bluefire and then rsh'ing to bems. Follow the prompts to change your shell. The change may take up to 60 minutes to propagate.

    At present, quotas are the same as they were for blueice. They may be increased in the future.

    Running Jobs

    Bluefire has 32 cpus/cores per node. Since the nodes have Simultaneous Multi-Threading (SMT) enabled, it will appear that there are 64 virtual cpus in each node. Your program may benefit from use of SMT. To use SMT, as in bluevista or blueice, you may just need to spawn more processes or threads than the number of physical cpus available. Typically 64 tasks/threads per node is most efficient; however, we recommend that you compare performance for your application for 32 and 64 tasks per node.

    Multiple page size support

    64-KB pages

    The default page size is 4 KB. On POWER6 systems, AIX 5L Version 5.3 supports a new 64-KB page size when running the 64-bit kernel. 64-KB pages are intended to be general-purpose. They are easy to use, and it is expected that many applications will see performance benefits when using 64-KB pages rather than 4-KB pages. IBM has reported performance improvements on a variety of workloads ranging from 1% to 13% when compared to the default 4-KB pages.

    A user can specify a different page size to use for each of the three regions of a process's address space (data, stack, and text). The ldedit command may be used to set these page size options in an existing executable:

    ldedit -btextpsize=64K -bdatapsize=64K -bstackpsize=64K a.out
    

    A user can also set a process's preferred page sizes via the LDR_CNTRL environment variable. The following example will cause a.out to use 4-KB pages for its data, 64-KB pages for its text, and 64-KB pages for its stack:

    Korn shell:

    export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K
    

    This will override any page size settings in an executable's XCOFF header.

    Caveat: Using 64-KB pages rather than 4 KB pages for a multithreaded process's data may reduce the maximum number of threads a process can create due to alignment requirements for stack guard pages. If you encounter this limit, you may disable stack guard pages by setting the environment variable AIXTHREAD_GUARDPAGES to 0.

    Page sizes for very high performance environments

    AIX 5.3 also supports large pages (16 MB) and "huge" pages (16 GB). However, these must be configured by the system administrator and the system rebooted. Users must be specifically authorized to use large pages. For further information on special requests for use of large pages, contact the CISL Consulting Office by any of the methods in our CISL Customer Support.

    Further details are discussed in the IBM Whitepaper, "Guide to Multiple Page Size Support on AIX 5L Version 5.3".

    Processor binding is mandatory

    We highly recommend using processor binding for all parallel jobs. If you have been using the bindproc.x script in /contrib on blueice, this has been replaced with an IBM-provided launch script on bluefire. To use it, set (Korn or Bourne shell syntax):

    You may provide a comma (,) separated list of cpu-ids.

    For hybrid programs, use:

    along with your OMP_NUM_THREADS environment variable setting.

    Important: All parallel jobs should begin using one of the launch scripts mentioned above with their mpirun.lsf command.

    Examples

    We are working to provide new examples; see directory /usr/local/examples on the system.

    Getting help

    Contact CISL Customer Support, call 303-497-1278, or send email to consult1@ucar.edu.

    Appendix A. Comparison of bluefire and blueice chips

    Resource Bluefire/Power6 Blueice/Power5+
    Clock cycle 4.7GHz 1.9GHz
    Memory/processor 2-4 GB 2-4 GB
    L1 cache L1 cache is 128 KB (64 KB data + 64 KB instruction) per processor. L1 cache is 96 KB (32 KB data + 64 KB instruction) per processor.
    L2 cache L2 cache is 4 MB per processor on-chip. L2 cache is 2 MB per processor on-chip.
    L3 cache The off-chip L3 cache is 32 MB per two-processor chip, and is shared by the two processors on the chip. L3 cache memory is connected to the chip via an 80-GB-per-second bus. 36 MB per processor pair, shared by all processors
    Switch Latency Infiniband 1.3 µs (peak) HPS 5.0 µs (peak)
    Switch Bandwidth 20 GBps each direction 1.7 GBps each direction
    Multiple Functional Units Main thing is faster clock; 2 floating point units; 3 fixed point units; two load/store units 2 floating point units; 3 fixed point units; two load/store units
    Simultaneous Multi-Threading (SMT) Yes - SMT appears to the OS as multiple CPUs. Threaded applications may take advantage of SMT. To use on bluefire, use double the number of tasks you used on blueice Same.