lightning user document heading  
NCAR
Last update: 08/31/2006

Lightning user doc contents

Programming requirements for compiling, building, and running jobs

Interactive (command-line) job submittal

Serial jobs and parallel jobs with OpenMP threads

Lightning has two login nodes for interactive use. You may run interactive serial or interactive OpenMP parallel jobs in these nodes following the usual procedure of working in any other UNIX environment. For example, to execute a serial job, enter:

    ./a.out

and to execute an OpenMP parallel job, enter:

    env OMP_NUM_THREADS=2 ./a.out

Distributed-memory parallel jobs using MPI (requires LSF batch system)

If you want to run interactive distributed-memory parallel jobs, you need to use the LSF batch scheduler. You have to use the debug queue to submit these jobs using the bsub command, and they can have at most 16 tasks. The "hello,world" example programs written in f90, c, and c++ demonstrate how to run interactive parallel jobs.

Please note that if your job runs longer than 30 minutes, you should run it in batch mode.

Batch job submittal with LSF

Load Sharing Facility (LSF) from Platform Computing Inc provides batch job capabilities on lightning. LSF manuals are available for browsing at the Platform Knowledge Centre.

An LSF job is a script or a file containing LSF directives. You submit a batch job to the queues with the bsub directive:

bsub < lsf_job_script_file

You can obtain a list of LSF batch queues using the bqueues command. See the bqueues man page for options.

You can list all of your queued and running jobs using the bjobs command. See the bjobs man page for options.

You can get a quick summary of all jobs running on the system using the lsfq command.

The lninfo command provides a quick summary of queue time limits and hardware information.

An introduction to LSF commands and scripts appears in the LSF for Users seminar materials (PowerPoint slides).

MPI jobs: LSF scripts should invoke mpirun.lsf (not mpirun). This is a wrapper for mpirun that enables the job to run properly under LSF. If you use mpirun directly, your job will not run correctly, and may leave orphan processes on the system.

Parallel programming terminology

IBM Linux cluster systems are a collection of processors organized into nodes. Each node contains two processors. A processor (called a "CPU" in this document) is the logic circuitry that responds to instructions for controlling the computer. A node is a collection of CPUs that share access to memory (memory space); in general, a node is an entity that accesses a network or an addressable point on a network. IBM Linux cluster systems contain an internal network with over 100 addressable nodes. This internal network is also called "the switch."

IBM Linux cluster systems can run user programs in a serial process, in parallel processes, and in both.

A process is an instance of a program running in a computer. The system kernel schedules execution of all processes (for example: store information in memory, perform operations on data using a CPU, store data on disk, communicate with other systems, etc.).

A serial process is a program that executes instructions on a single CPU.

A thread is a piece of a process. It runs as a separate entity under the control of that single process, is tracked by that process, and returns its computational result to that process. Threads help your job run faster because several independent pieces of the same process run at once. Threads share the same memory space, so you must make sure that threads in the same process do not interfere with each other.

Threads are run on two CPUs within a node using OpenMP directives or POSIX threads. OpenMP directives are used by most NCAR programmers who use threads. Note that "OpenMP" is sometimes called "OMP." More information about the OpenMP standard appears in this excellent tutorial. More information about the POSIX standard appears in this tutorial and at the Programming POSIX Threads website.

Note: Threads are a form of parallelism, and people may use the word "parallel" when referring to processes that use threads. This can cause confusion. CISL documentation always refers to threads as "threads" to avoid confusion with "parallel processes" as defined in the next paragraph.

Parallel processes are multiple coordinated independent programs that execute simultaneously on multiple CPUs to achieve a common goal. Parallel programmming has three aspects:

  • Using parallel threads on a node.
  • Using message passing between processes. Message passing is a form of interprocess communication in which processes send discrete messages to one another to exchange data.
  • Parallel programs are called "hybrid" when they use both threads and message passing.

IBM Linux cluster systems are clusters of Symmetric Multi Processor (SMP) systems, a computer architecture that collects multiple CPUs into nodes. Multiple simultaneous processes can be run within a node, on multiple nodes, or both. The CPUs on a node share that node's memory and I/O bus (data path). Each node runs its own copy of the Linux operating system. Any idle CPU can be assigned to any task, and additional CPUs and nodes can be utilized by a job to improve performance and handle increased loads.

Programming strategies for using lightning

There are four basic programming strategies for computing on lightning. These four strategies allow you to match your program's requirements to the capabilities of lightning's computing architecture. All four strategies have the same goal: to obtain accurate results for computational problems in the minimum amount of wallclock time.

The four programming strategies, in order of increasing demands on the system and the programmer, are:

  1. One serial process
  2. One process that spawns multiple threads
  3. Multiple parallel processes that are single-threaded
        One code acting on multiple data structures (Single Program Multiple Data -- SPMD)
        Multiple codes acting on multiple data structures (Multiple Programs Multiple Data -- MPMD)
  4. Hybrid: Multiple parallel processes (SPMD or MPMD) that use multiple threads

Programming one serial process

A serial process runs one sequence of instructions on a single CPU. For optimal results, a serial process has low computational requirements and runs to completion quickly. Lightning runs a serial process on one CPU of a two-CPU node.

This serial job example is presented as an LSF script that runs the example in three programming languages: Fortran, C, and C++.

Programming one process that spawns multiple OpenMP threads

A single process that spawns threads runs one sequence of instructions, but some of these instructions can be performed simultaneously on multiple CPUs. The process divides into parts (threads) that can execute on different CPUs on the node, but the process controls all the threads. Since the threads all share the same memory space, they must be implemented to cooperate with each other and avoid memory reference interference.

A single process with threads is used when the computational requirements are modest (the requirements can be met by the CPUs on a single node), and when performance can be improved by running parts of the process on multiple CPUs. This approach is less complicated than programming parallel processes with message passing (see below).

Threaded job example.

Programming multiple parallel processes

Parallel processes are coordinated to work on different parts of a problem and contribute to a common result. Parallel processes can run on a single node, and they can run on multiple nodes. The programmer must divide the problem into discrete processes and ensure that they work together effectively.

To improve the performance of your application, you can program multiple parallel processes. This approach can make better use of the system resources on IBM SP-cluster systems, but it is more complicated than having the compiler divide your executable into threads. If the computational requirements of your problem cannot be met by the resources on a single node in a reasonable amount of wallclock time, then programming multiple parallel processes is required.

Normally, multiple parallel processes need to "share" computational results with each other. This is done by message passing. Message passing is a form of interprocess communication in which processes send discrete messages to one another to exchange data between processes. MPI, the Message Passing Interface, is the message passing standard, and we recommend that you use MPI on lightning. Here is a good selection of MPI tutorials.

Note: There are many pitfalls not described in these tutorials, but it is important for you to understand the basics first. Don't focus on these pitfalls now; many of them may not even apply to your codes.

The parallel (MPI) examples in the next section include two types of parallel processing jobs: SPMD (Single Program Multiple Data structures) and MPMD (Multiple Programs Multiple Data structures). SPMD refers to a single code that operates on different data structures at the same time (in parallel). MPMD refers to multiple codes that operate simultaneously on different data structures. At runtime, SPMD cases specify one process and how many times to instantiate it. MPMD cases specify a command file list of processes to instantiate. Note how these are implemented in the following examples.

SPMD parallel job using MPI-standard message passing example or MPMD parallel job using MPI-standard message passing example.

Programming multiple parallel processes that use multiple threads

When a program uses both message passing between nodes running parallel processes and threads within a node, it is called a hybrid program. Typically, a process using threads does not share a node with another process. Programming parallel processes with threads is a way to improve performance on your problem because threads can significantly improve the computational speed of some processes.

Programming hybrid parallel processes is an art. It requires some trial and error to find the optimal balance between using threads and parallel processes.

Hybrid parallel job (using threads and message passing) example.


Next page | Table of contents - Lightning user guide

If you have questions about this document, please contact CISL Customer Support. You can also reach us by telephone 24 hours a day, seven days a week at 303-497-1278. Additional contact methods: consult1@ucar.edu and during business hours in NCAR Mesa Lab Suite 39.

© Copyright 2004-2005. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.

Address of this page: http://www.cisl.ucar.edu/docs/lightning/program.jsp