SCDzine
N E W S

Time limits enforced on sioux batch jobs

How to protect your jobs from timeouts . . .

sioux
sioux


Contents

Search

Article index

Back issues

Subscribe

Contact us

SCD

by Tom Parker


SCD has implemented a new procedure on the HP Exemplar SPP2000 (sioux) to enforce the wall-clock time limit for batch jobs. This procedure went into effect on 21 April 1998. It was needed because the current NQS batch system on sioux does not enforce time limits.


How it works

Here's how the new procedure works:

When your batch job has just 10 minutes left of the allowed wall-clock time for its batch queue, then the system will check to see if there are any other jobs waiting to run in the same queue, and:

  • If there are any other jobs waiting in the same queue, then the system will send your batch job a SIGPWR signal, signifying that your job will be cancelled in 10 minutes.

  • If there are no other jobs waiting in the same queue, then your job will continue to run. However, if any new jobs arrive into the same queue, then the 10-minute signal mentioned above will be sent at that time.

If you have a long-running job, you might find it useful to modify your job to detect the SIGPWR signal and shut down gracefully (e.g., save a history file), stopping before the system cancels your job, which could possibly result in loss of work.


How to prepare your Fortran jobs

Here's how you can prepare your Fortran jobs to catch the SIGPWR signal and thus protect your jobs from timeouts:

  1. Set up to catch the signal. You need to CALL SIGNAL early in your program to designate the signal number you are expecting (in this case 19, which is SIGPWR) and the name of your shutdown routine. If your job should later receive a SIGPWR signal, it will then automatically call your shutdown routine.

    Here is a sample code:

            program main
            external shutdown
            call signal(19,shutdown,-1)
      C
      C Your main program goes here
      C
            end
    

  2. Write your "shutdown" routine. Here is a template:
            subroutine shutdown
      C
      C Replace the following print statement with your 
      C shutdown code (e.g, save a history file).
      C
            print *,'Caught SIGPWR signal - arrived in
           x shutdown routine.'
      C
      C Use STOP (not RETURN) here!
      C
            stop
            end
    

  3. Compile with the +U77 option. When you compile, you need to specify the +U77 option so the system will pick the correct version of the SIGNAL subroutine. For example:

       f77 +U77 myprog.f
    or
       f90 +U77 myprog.f


For more information

For more information on sioux, please see "Getting started guide: sioux". You can also e-mail the SCD Consulting Office, or call them at (303) 497-1278.

rule
Contents || Search || Article index || Back issues || Subscribe || Contact us || SCD