sioux
|
by Tom Parker
SCD has implemented a new procedure on the HP Exemplar SPP2000 (sioux) to enforce the wall-clock time
limit for batch jobs. This procedure went into effect on 21 April 1998. It was needed because the current NQS batch system on sioux does not enforce time limits.
How it works
Here's how the new procedure works:
When your batch job has just 10 minutes left of the allowed wall-clock time
for its batch queue, then the system will check to see if there are any
other jobs waiting to run in the same queue, and:
- If there are any other jobs waiting in the same queue, then the system
will send your batch job a SIGPWR signal, signifying that your job will
be cancelled in 10 minutes.
- If there are no other jobs waiting in the same queue, then your job
will continue to run. However, if any new jobs arrive into the same
queue, then the 10-minute signal mentioned above will be sent at that
time.
If you have a long-running job, you might find it useful to modify your job to
detect the SIGPWR signal and shut down gracefully (e.g., save a history
file), stopping before the system cancels your job, which could possibly result in loss of work.
How to prepare your Fortran jobs
Here's how you can prepare your Fortran jobs to catch the SIGPWR signal and
thus protect your jobs from timeouts:
Set up to catch the signal. You need to CALL SIGNAL early in your program to designate the signal
number you are expecting (in this case 19, which is SIGPWR) and the
name of your shutdown routine. If your job should later receive a
SIGPWR signal, it will then automatically call your shutdown routine.
Here is a sample code:
program main
external shutdown
call signal(19,shutdown,-1)
C
C Your main program goes here
C
end
Write your "shutdown" routine. Here is a template:
subroutine shutdown
C
C Replace the following print statement with your
C shutdown code (e.g, save a history file).
C
print *,'Caught SIGPWR signal - arrived in
x shutdown routine.'
C
C Use STOP (not RETURN) here!
C
stop
end
- Compile
with the +U77 option.
When you compile, you need to specify the +U77 option so the
system will pick the correct version of the SIGNAL subroutine. For
example:
f77 +U77 myprog.f
or
f90 +U77 myprog.f
For more information
For more information on sioux, please see "Getting started guide: sioux".
You can also e-mail the SCD Consulting Office, or call them at (303)
497-1278.
|