Getting started on bluevista
last update: 01/31/2008

Overview of bluevista

Bluevista was acquired in January 2006. Experiments with larger codes indicate that bluevista runs many codes twice as fast as the older supercomputer bluesky. This is largely due to bluevista's faster clock speed and switch speed and other enhancements of the POWER5 chip over the POWER4 chip in bluesky.

System architecture

Bluevista is an IBM clustered Symmetric MultiProcessing (SMP) system based on the POWER5 processor. All of bluevista's nodes (interactive and batch) are 8-way nodes.

Performance comparisons with other NCAR systems

For a comparison of the processor characteristics in use on various distributed shared memory systems at NCAR, see Comparison of CPUs in DSM systems at NCAR/CISL. Benchmark data on the relative performance of bluevista and blueice are available to authorized users (UCAS password required).

Simultaneous Multi-Threading (SMT)

Simultaneous Multi-Threading runs by default under bluevista's operating system AIX version 5.3. SMT is a lightweight hardware switch mechanism that makes use of idle time on processors to complete jobs more quickly. In an SMT-enabled node, there are two logical processors for each physical processor.

SMT is beneficial where the number of tasks or threads exceeds the number of physical processors available. Further, SMT is most beneficial to codes that are less optimized for memory access or that spend more time waiting for communication or I/O.

See the section, Using Simultaneous Multi-Threading (SMT) for information and examples.

Software environment

Bluevista runs the AIX Operating System and produces 64-bit APIs instead of 32-bit APIs as on bluesky. You need to set environment variable OBJECT_MODE to 32 if you want to build 32-bit APIs. The software stack includes the IBM Cluster System Management CSM; the General Parallel File System GPFS; the Parallel Operating Environment POE; mathematical libraries PESSL, ESSL, MASS, and MASSV; and the XL compiler suite for C, C++, Fortran, and Perl. Note that the batch system is the Load Sharing Facility LSF rather than the IBM LoadLeveler product.

To see specific version numbers of these products on bluevista, you can run the command /usr/local/bin/pmrinfo.

System security and access method

The system security relies on One Time Passwords OTP (implemented with CRYPTOCards), together with the Secure Shell (SSH). Please see our password guidelines in this context.

File system

The native file system under AIX is the General Parallel File System GPFS. Your home directory and /ptmp scratch directory (see description under "Disk space" section below) are included in the GPFS filesystem.

Disk space

Home directory: Each user is assigned 7 GBs disk space in their home directory, /homebv/logon_id. The files in this directory are backed up.

/tmp directory: Please do not use /tmp in writing to disk. Instead, use the /ptmp directory for scratch purposes, as discussed immediately below. /tmp disk space usage is required by the OS, is very limited in size, and causes system problems when swamped.

/ptmp scratch directory: Each user is allocated up to 310 GBs scratch space in their /ptmp directory, /ptmp/logon_id. Users are encouraged to use this scratch space, but we emphasize that this filesystem will be scrubbed when the overall /ptmp space is 85% used. An automatic scrubber will delete least recently accessed files until the filesystem is below 85% full. Because of the enormous size of ptmp, your ptmp files are not backed up. This means that when they are scrubbed, they are gone forever unless you have copied them to the NCAR Mass Storage System, your home directory, or some other archival storage. We encourage your vigilance in backing up critical files.

Checking home and ptmp usage: You may check your home and ptmp quotas and usage by executing command /usr/local/bin/spquota. You may check the overall usage of /ptmp by executing command /usr/local/bin/df -h /ptmp. Executing this last command will help you anticipate the system scrubbing of ptmp files by showing the percentage of overall ptmp usage. As mentioned above, an 85% full level triggers the scrubber.

Divisional file systems: CISL provides file space for UCAR users that can be used to supplement user home directories and /ptmp space. Space on these divisional file systems is provided on the basis of requests made to designated divisional representatives who implement their own policies regarding space and quota allocations. These divisional file systems are not backed up by CISL nor are they scrubbed. To acquire space on the file system provided for your division, contact the proper divisional representative listed under the password-protected link in this sentence.

/fis file system: CISL provides file space for some users and projects under /fis, a non-GPFS filesystem mounted on bluevista. You may see which divisions and projects have space in /fis by executing the Unix "df" command to list disk free space. Please speak to your project manager for clarification on /fis usage relative to your projects. The /fis filesystem is regarded as high-availability, high-reliability disk space, small-sized relative to GPFS, and expensive. These files are backed up.

Obtaining an account and your login

To request a bluevista account and login, please contact us via any of the methods (phone, email, ticket, or in person) described here: CISL Customer Support.

User environment - shells, paths, dotfiles

Your default shell: New accounts are set up with Korn shell by default. Perhaps you have changed it since then, or perhaps you don't remember. You can determine your default shell by executing command "grep $LOGIN /etc/passwd", and examining the substring at the end of the command output e.g. :/bin/csh or :/bin/ksh etc. Whether you choose the Korn shell or another, please follow the advice in the following paragraphs.

Permanently changing your default shell: After you log on to bluevista, you may change your shell by logging onto the bluevista management system. To do so, execute command "ssh blueview.ucar.edu" and logon to blueview using your bluevista logon name and password. When logged on, select "s" on the menu and follow the menu in choosing the shell.

After changing your shell: Shell changes take about an hour to be propagated to bluevista. If you have changed your shell, you may need to make alterations to your environment appropriate for your new shell.

Copying dotfiles from other systems: If you choose to copy dotfiles from other systems, please remove script segments specific to that system, or replace them with segments appropriate to bluevista. This is important to do, even if you have copied the dotfiles from bluevista to bluevista.

Troubleshooting

Remove LSF version-specific commands from your dotfiles: If you are using old dotfiles brought over from bluesky, please examine them for environment variables "path", "PATH", and "MANPATH" to look for elements /usr/local/lsf/6.1/aix5-64/bin or similar elements containing "lsf" and version numbers (e.g. 6.1), and remove them. Doing so will ensure you get the latest LSF release, now and in the future, because the system will automatically add LSF commands to your path.

"Word too long" fatal diagnostic: This problem may show up in either interactive or batch jobs that resubmit themselves. Its cause is often self-reference in a variable definition. For example, in .cshrc "setenv PATH ${PATH}:/usr/local/bin:~/bin" expands the path, which grows with each script resubmission until the maximum length of 1024 bytes is exceeded and the UNIX "Word too long" error occurs. To solve this, C shell users may add the line "source /contrib/shutils/trimpath.csh" to their .cshrc file. Korn shell users who get this diagnostic should remove the .cshrc file from their home directory, as it also may cause this problem. If you take this advice, before removing the .cshrc file, please rewrite any needed segments of it in Korn-shell script, and place them in your .profile.

Job queues and charging

Please see Queues and charging for resource usage on bluevista for details about job queues and charging.

Please do not run production jobs from the command line, e.g. do not execute command wrf.exe from the interactive command line, but rather use the bsub command to submit the job script to an appropriate batch queue. Running production jobs from the command line slows the interactive response time, and fast responsive time is needed for editing files and doing small tasks appropriate to the command line. Production jobs running from the command line will be killed.

Allocation thresholds for projects influence job scheduling

The job scheduler follows a procedure for handling NCAR divisions and CSL proposal groups exceeding their allocations. The process by which queued jobs are selected for execution affects all IBM SP-cluster systems users: both community computing users and CSL users.

For CSL proposal groups and NCAR divisions with a monthly allocation:

Note: You can employ two strategies to minimize charges. 1) After your project has been flagged, you can submit jobs to the standby queue to accrue minimum charges. 2) Before your project exceeds its 30-day or 90-day allocation threshold, select lower-priority work to be run in the standby queue and save GAUs for higher-priority work.

Note: Standby jobs generate MSS charges in the normal fashion.

University projects are given lifetime allocations. Jobs for these projects are not accepted in any queue after the allocation is exceeded. University projects may request additional resources by contacting us at CISL Customer Support.

In addition to division limits, many NCAR divisional projects have additional limits that affect their ability to run. NCAR division administrators can request that a project belonging to the division they represent be given an allocation for a specific time period such as by the month, fiscal year, or lifetime of the project. When the allocation limit is reached, jobs will not be accepted by any charged compute server even in a lower-priority queue.

A regular joint project must receive approval from the NCAR director and the appropriate division director. (Joint projects involve both NCAR and University researchers.) Each NCAR division is assigned a fiscal year allocation for all regular joint projects. When the divisional limit is exceeded, all jobs submitted by joint projects involving this division are placed on hold1 status.

If you have questions about the type of project you have or the applicable limits, please contact us at CISL Customer Support.

Troubleshooting program failures

If you need help troubleshooting program failures on our supercomputers, please contact the CISL Consulting Services Group via any of the methods provided in CISL Customer Support. Be prepared to provide the following information:

The CISL ticket system provided in the CISL Customer Support link above is our preferred method of handling troubleshooting and other problems. We prefer it as a means to communicate with customers as well as tracking problems and calling in other CISL support when needed.

Next page | Table of contents - Getting started on bluevista

If you have questions about this document, please contact us via any of the methods (phone, email, ticket, or in person) described here: CISL Customer Support.

© Copyright 2007. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.

Address of this page: http://www.cisl.ucar.edu/docs/bluevista/index.html