Programming requirements for compiling, building, and running
jobs
Serial jobs and parallel jobs with OpenMP threads
Lightning has two login nodes for interactive use. You may run
interactive serial or interactive OpenMP parallel jobs in these nodes
following the usual procedure of working in any other UNIX environment.
For example, to execute a serial job, enter:
./a.out
and to execute an OpenMP parallel job, enter:
env OMP_NUM_THREADS=2 ./a.out
Distributed-memory parallel jobs using MPI (requires LSF batch
system)
If you want to run interactive distributed-memory parallel jobs,
you need to use the LSF batch scheduler. You have to use the debug
queue to submit these jobs using the bsub command, and they can have at
most 16 tasks. The "hello,world" example programs written in
f90,
c, and
c++ demonstrate
how to run interactive parallel jobs.
Please note that if your job runs longer than 30 minutes, you should
run it in batch mode.
Load Sharing Facility (LSF) from Platform Computing Inc provides
batch job capabilities on lightning. LSF manuals are available for
browsing at the
Platform Knowledge Centre.
An LSF job is a script or a file containing LSF directives.
You submit a batch job to the queues with the bsub directive:
bsub < lsf_job_script_file
You can obtain a list of LSF batch queues using the bqueues
command. See the bqueues man page for options.
You can list all of your queued and running jobs using the
bjobs command. See the bjobs man page for options.
You can get a quick summary of all jobs running on the system
using the lsfq command.
The lninfo command provides a quick summary of queue time
limits and hardware information.
An introduction to LSF commands and scripts appears in the
LSF for Users seminar
materials (PowerPoint slides).
MPI jobs: LSF scripts should invoke mpirun.lsf (not mpirun).
This is a wrapper for mpirun that enables the job to run properly
under LSF. If you use mpirun directly, your job will not run correctly,
and may leave orphan processes on the system.
IBM Linux cluster systems are a collection of processors organized
into nodes. Each node contains two processors. A processor (called a
"CPU" in this document) is the logic circuitry that responds
to instructions for controlling the computer. A node is a
collection of CPUs that share access to memory (memory space);
in general, a node is an entity that accesses a network or an
addressable point on a network. IBM Linux cluster systems contain an
internal network with over 100 addressable nodes. This internal
network is also called "the switch."
IBM Linux cluster systems can run user programs in a serial
process, in parallel processes, and in both.
A process is an instance of a program running in a computer.
The system kernel schedules execution of all processes (for example:
store information in memory, perform operations on data using a CPU,
store data on disk, communicate with other systems, etc.).
A serial process is a program that executes instructions
on a single CPU.
A thread is a piece of a process. It runs as a separate
entity under the control of that single process, is tracked by that
process, and returns its computational result to that process. Threads
help your job run faster because several independent pieces of the
same process run at once. Threads share the same memory space, so you
must make sure that threads in the same process do not interfere with
each other.
Threads are run on two CPUs within a node using OpenMP
directives or POSIX threads. OpenMP directives are used by most
NCAR programmers who use threads. Note that "OpenMP" is sometimes
called "OMP." More information about the
OpenMP standard appears in
this excellent
tutorial. More information about the
POSIX standard appears in this
tutorial and at the
Programming POSIX
Threads website.
Note: Threads are a form of parallelism, and people
may use the word "parallel" when referring to processes that use
threads. This can cause confusion. CISL documentation
always refers to threads as "threads" to avoid confusion with
"parallel processes" as defined in the next paragraph.
Parallel processes are multiple coordinated independent
programs that execute simultaneously on multiple CPUs to achieve
a common goal.
Parallel programmming has three aspects:
- Using parallel threads on a node.
- Using message passing between processes. Message passing
is a form of interprocess communication in which processes send
discrete messages to one another to exchange data.
- Parallel programs are called "hybrid" when they use both threads
and message passing.
IBM Linux cluster systems are clusters of Symmetric Multi Processor
(SMP) systems, a computer architecture that collects multiple CPUs
into nodes. Multiple simultaneous processes can be run within a node,
on multiple nodes, or both. The CPUs on a node share that node's
memory and I/O bus (data path). Each node runs its own copy of the
Linux operating system. Any idle CPU can be assigned to any task,
and additional CPUs and nodes can be utilized by a job to improve
performance and handle increased loads.
There are four basic programming strategies for computing on
lightning. These four strategies allow you to match your program's
requirements to the capabilities of lightning's computing
architecture. All four strategies have the same goal: to obtain
accurate results for computational problems in the minimum amount
of wallclock time.
The four programming strategies, in order of increasing demands on
the system and the programmer, are:
- One serial process
- One process that spawns multiple threads
- Multiple parallel processes that are single-threaded
One code acting on multiple data structures
(Single Program Multiple Data -- SPMD)
Multiple codes acting on multiple data structures
(Multiple Programs Multiple Data -- MPMD)
- Hybrid: Multiple parallel processes (SPMD or MPMD) that use
multiple threads
A serial process runs one sequence of instructions on a single
CPU. For optimal results, a serial process has low computational
requirements and runs to completion quickly. Lightning runs a
serial process on one CPU of a two-CPU node.
This
serial job example
is presented as an LSF script that runs the example in three
programming languages: Fortran, C, and C++.
A single process that spawns threads runs one sequence of
instructions, but some of these instructions can be performed
simultaneously on multiple CPUs. The process divides into parts
(threads) that can execute on different CPUs on the node, but
the process controls all the threads. Since the threads all share
the same memory space, they must be implemented to cooperate with
each other and avoid memory reference interference.
A single process with threads is used when the computational
requirements are modest (the requirements can be met by the CPUs
on a single node), and when performance can be improved by running
parts of the process on multiple CPUs. This approach is less
complicated than programming parallel processes with message
passing (see below).
Threaded
job example.
Parallel processes are coordinated to work on different parts of a
problem and contribute to a common result. Parallel processes can run
on a single node, and they can run on multiple nodes. The programmer
must divide the problem into discrete processes and ensure that they
work together effectively.
To improve the performance of your application, you can program
multiple parallel processes. This approach can make better use of the
system resources on IBM SP-cluster systems, but it is more complicated
than having the compiler divide your executable into threads. If the
computational requirements of your problem cannot be met by the
resources on a single node in a reasonable amount of wallclock time,
then programming multiple parallel processes is required.
Normally, multiple parallel processes need to "share" computational
results with each other. This is done by message passing. Message
passing is a form of interprocess communication in which processes
send discrete messages to one another to exchange data between
processes.
MPI,
the Message Passing Interface, is the message passing standard,
and we recommend that you use MPI on lightning. Here is a good
selection of
MPI
tutorials.
Note: There are many pitfalls not described in these
tutorials, but it is important for you to understand the basics
first. Don't focus on these pitfalls now; many of them may not
even apply to your codes.
The parallel (MPI) examples in the next section include two
types of parallel processing jobs: SPMD (Single Program Multiple
Data structures) and MPMD (Multiple Programs Multiple Data
structures). SPMD refers to a single code that operates on
different data structures at the same time (in parallel). MPMD
refers to multiple codes that operate simultaneously on different
data structures. At runtime, SPMD cases specify one process and
how many times to instantiate it. MPMD cases specify a command
file list of processes to instantiate. Note how these are
implemented in the following examples.
SPMD parallel
job using MPI-standard message passing example or
MPMD parallel job
using MPI-standard message passing example.
When a program uses both message passing between nodes running
parallel processes and threads within a node, it is called a hybrid
program. Typically, a process using threads does not share a node
with another process. Programming parallel processes with threads
is a way to improve performance on your problem because threads can
significantly improve the computational speed of some processes.
Programming hybrid parallel processes is an art. It requires
some trial and error to find the optimal balance between using
threads and parallel processes.
Hybrid
parallel job (using threads and message passing) example.
Next page |
Table of contents - Lightning user
guide
If you have questions about this document, please contact
CISL Customer Support.
You can also reach us by telephone 24 hours a day, seven days a week at
303-497-1278.
Additional contact methods:
consult1@ucar.edu
and during
business hours
in NCAR Mesa Lab Suite 39.
© Copyright 2004-2005. University Corporation for Atmospheric
Research (UCAR). All Rights Reserved.
Address of this page:
http://www.cisl.ucar.edu/docs/lightning/program.jsp
|