Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

External Job Submission and Execution Controls

This document describes the use of external job submission and execution controls called esub and eexec. These site-specific user-written executables are used to validate, modify, and reject job submissions, pass data to and modify job execution environments.

Contents

Understanding External Executables

About esub and eexec

LSF provides the ability to validate, modify, or reject job submissions, modify execution environments, and pass data from the submission host directly to the execution host through the use of the esub and eexec executables. Both are site-specific and user written and must be located in LSF_SERVERDIR.

Validate, modify, or reject a job

To validate, modify, or reject a job, an esub needs to be written. See Using esub

Modifying execution environments

To modify the execution environment on the execution host, an eexec needs to be written. See Working with eexec

Passing data

To pass data directly to the execution host, an esub and eexec need to be written. See Using esub and eexec to pass data to execution environments

Interactive remote execution

Interactive remote execution also runs esub and eexec if they are found in LSF_SERVERDIR. For example, lsrun invokes esub, and RES runs eexec before starting the task. esub is invoked at the time of the ls_connect(3) call, and RES invokes eexec each time a remote task is executed. RES runs eexec only at task startup time.

DCE credentials and AFS tokens

esub and eexec are also used for processing DCE credentials and AFS tokens. See the following documents on the Platform Web site for more information:

Using esub

About esub

An esub, short for external submission, is a user-written executable (binary or script) that can be used to validate, modify, or reject jobs. The esub is put into LSF_SERVERDIR (defined in lsf.conf) where LSF checks for its existence when a job is submitted, restarted, and modified. If LSF finds an esub, it is run by LSF. Whether the job is submitted, modified, or rejected depends on the logic built into the esub.

Any messages that need to be provided to the user should be directed to the standard error (stderr) stream and not the standard output (stdout) stream.

In this section

Environment variables to bridge esub and LSF

LSF provides the following environment variables in the esub execution environment:

LSB_SUB_PARM_FILE

This variable points to a temporary file containing the job parameters that esub reads when the job is submitted. The submission parameters are a set of name-value pairs on separate lines in the format "option_name=value".

The following option names are supported:

Option
Description
LSB_SUB_ADDITIONAL
String format parameter containing the value of the -a option to bsub
The value of -a is passed to esub, but it does not directly affect the other bsub parameters or behavior. The value of -a must correspond to an actual esub file. For example, to use bsub -a fluent, the file esub.fluent must exist in LSF_SERVERDIR.
LSB_SUB_ADDITIONAL cannot be changed in or added to LSB_SUB_MODIFY_FILE.
LSB_SUB_BEGIN_TIME
Begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_CHKPNT_DIR
Checkpoint directory
LSB_SUB_COMMAND_LINE
bsub job command argument
LSB_SUB_COMMANDNAME must be set in lsf.conf to enable esub to use this variable
LSB_SUB_CHKPNT_PERIOD
Checkpoint period in seconds
LSB_SUB_DEPEND_COND
Dependency condition
LSB_SUB_ERR_FILE
Standard error file name
LSB_SUB_EXCEPTION
Exception condition
LSB_SUB_EXCLUSIVE
"Y" specifies exclusive execution
LSB_SUB_EXTSCHED_PARAM
Validate or modify bsub -extsched option
LSB_SUB_HOLD
Hold job (bsub -H option)
LSB_SUB_HOSTS
List of execution host names
LSB_SUB_HOST_SPEC
Host specifier
LSB_SUB_IN_FILE
Standard input file name
LSB_SUB_INTERACTIVE
"Y" specifies an interactive job
LSB_SUB_LOGIN_SHELL
Login shell
LSB_SUB_JOB_NAME
Job name
LSB_SUB_JOB_WARNING_ACTION
Job warning action specified by bsub -wa
LSB_SUB_JOB_ACTION_WARNING_TIME
Job warning time period specified by bsub -wt
LSB_SUB_MAIL_USER
Email address used by LSF for sending job email
LSB_SUB_MAX_NUM_PROCESSORS
Maximum number of processors requested
LSB_SUB_MODIFY
"Y" specifies a modification request
LSB_SUB_MODIFY_ONCE
"Y" specifies a modification-once request
LSB_SUB_NOTIFY_BEGIN
"Y" specifies email notification when job begins
LSB_SUB_NOTIFY_END
"Y" specifies email notification when job ends
LSB_SUB_NUM_PROCESSORS
Minimum number of processors requested
LSB_SUB_OTHER_FILES
The value is SUB_RESET if defined to indicate a bmod is being performed to reset the number of files to be transferred
LSB_SUB_OTHER_FILES_number
number is an index number indicating the particular file transfer value is the specified file transfer expression.
For example, for bsub -f "a > b" -f "c < d", the following would be defined:
LSB_SUB_OTHER_FILES_0="a > b"
LSB_SUB_OTHER_FILES_1="c < d"
LSB_SUB_OUT_FILE
Standard output file name
LSB_SUB_PRE_EXEC
Pre-execution command
LSB_SUB_PROJECT_NAME
Project name
LSB_SUB_PTY
"Y" specifies an interactive job with PTY support
LSB_SUB_PTY_SHELL
"Y" specifies an interactive job with PTY shell support
LSB_SUB_QUEUE
Submission queue name
LSB_SUB_RERUNNABLE
"Y" specifies a rerunnable job
"N" specifies a nonrerunnable job (specified with bsub -rn). The job is not rerunnable even it was submitted to a rerunable queue or application profile
For bmod -rn, the value is SUB_RESET.
LSB_SUB_RES_REQ
Resource requirement string-does not support multiple resource requirement strings
LSB_SUB_RESTART
"Y" specifies a restart job
LSB_SUB_RESTART_FORCE
"Y" specifies forced restart job
LSB_SUB_RLIMIT_CORE
Core file size limit
LSB_SUB_RLIMIT_CPU
CPU limit
LSB_SUB_RLIMIT_DATA
Data size limit
LSB_SUB_RLIMIT_FSIZE
File size limit
LSB_SUB_RLIMIT_PROCESS
Process limit
LSB_SUB_RLIMIT_RSS
Resident size limit
LSB_SUB_RLIMIT_RUN
Wall-clock run limit
LSB_SUB_RLIMIT_STACK
Stack size limit
LSB_SUB_RLIMIT_SWAP
Virtual memory limit (swap space)
LSB_SUB_RLIMIT_THREAD
Thread limit
LSB_SUB_TERM_TIME
Termination time, in seconds, since 00:00:00 GMT, Jan. 1, 1970
LSB_SUB_TIME_EVENT
Time event expression
LSB_SUB_USER_GROUP
User group name
LSB_SUB_WINDOW_SIG
Window signal number
LSB_SUB2_JOB_GROUP
Options specified by bsub -g
LSB_SUB2_LICENSE_PROJECT
LSF License Scheduler project name specified by bsub -Lp
LSB_SUB2_IN_FILE_SPOOL
Spooled input file (bsub -is)
LSB_SUB2_JOB_CMD_SPOOL
Spooled job command file (bsub -Zs)
LSB_SUB2_JOB_PRIORITY
Job priority (bsub- sp and bmod -sp)
For bmod -spn, the value is SUB_RESET
LSB_SUB2_SLA
SLA scheduling options
LSB_SUB2_USE_RSV
Advance reservation ID specified by bsub -U
LSB_SUB3_ABSOLUTE_PRIORITY
For bmod -aps, the value equal to the APS string given with the bmod -aps. For bmod -apsn, the value is SUB_RESET.
LSB_SUB3_APP
Options specified by bsub- app and bmod -app. For bmod -appn, the value is SUB_RESET.
LSB_SUB3_JOB_REQUEUE
String format parameter containing the value of the -Q option to bsub. For bmod -Qn, the value is SUB_RESET.
LSB_SUB3_CWD
Current working directory specified on on the command line with bsub -cwd
LSB_SUB3_POST_EXEC
Run the specified post-execution command on the execution host after the job finishes. Specified by bsub -Ep.
LSB_SUB3_RUNTIME_ESTIMATION
Runtime estimate spedified by bsub -We
LSB_SUB3_USER_SHELL_LIMITS
Pass user shell limits to execution host. Spedified by bsub -ul.

Example submission parameter file

If a user submits the following job:

bsub -q normal -x -P my_project -R "r1m rusage[dummy=1]" -n 90 sleep 10

The contents of the LSB_SUB_PARM_FILE will be:

LSB_SUB_QUEUE="normal"
LSB_SUB_EXCLUSIVE=Y
LSB_SUB_RES_REQ="r1m rusage[dummy=1]"
LSB_SUB_PROJECT_NAME="my_project"
LSB_SUB_COMMAND_LINE="sleep 10"
LSB_SUB_NUM_PROCESSORS=90
LSB_SUB_MAX_NUM_PROCESSORS=90 
LSB_SUB_ABORT_VALUE

This variable indicates the value esub should exit with if LSF is to reject the job submission.

LSB_SUB_MODIFY_ENVFILE

The file in which esub should write any changes to the job environment variables.

esub writes the variables to be modified to this file in the same format used in LSB_SUB_PARM_FILE. The order of the variables does not matter.

After esub runs, LSF checks LSB_SUB_MODIFY_ENVFILE for changes and if found, LSF will apply them to the job environment variables.

LSB_SUB_MODIFY_FILE

The file in which esub should write any submission parameter changes.

esub writes the job options to be modified to this file in the same format used in LSB_SUB_PARM_FILE. The order of the options does not matter. After esub runs, LSF checks LSB_SUB_MODIFY_FILE for changes and if found LSF will apply them to the job.

tip:  
LSB_SUB_ADDITIONAL cannot be changed in or added to LSB_SUB_MODIFY_FILE.
LSF_INVOKE_CMD

Indicates the name of the last LSF command that invoked an external executable (for example, esub).

External executables get called by several LSF commands (bsub, bmod, lsrun). This variable contains the name of the last LSF command to call the executable.

General esub logic

After esub runs, LSF checks:

  1. Is the esub exit value LSB_SUB_ABORT_VALUE?
    1. Yes, step 2
    2. No, step 4
  2. Reject the job
  3. Go to step 5
  4. Does LSB_SUB_MODIFY_FILE or LSB_SUB_MODIFY_ENVFILE exist?
  5. Done

Rejecting jobs

Depending on your policies you may choose to reject a job. To do so, have esub exit with LSB_SUB_ABORT_VALUE.

If esub rejects the job, it should not write to either LSB_SUB_MODIFY_FILE or LSB_SUB_MODIFY_ENVFILE.

Example

The following Bourne shell esub rejects all job submissions by exiting with LSB_SUB_ABORT_VALUE:

#!/bin/sh

# Redirect stderr to stdout so echo can be used for 
# error messages 
exec 1>&2

# Reject the submission
   echo "LSF is Rejecting your job submission..."
   exit $LSB_SUB_ABORT_VALUE 

Validating job submission parameters

One use of validation is to support project-based accounting. The user can request that the resources used by a job be charged to a particular project. Projects are associated with a job at job submission time, so LSF will accept any arbitrary string for a project name. In order to ensure that only valid projects are entered and the user is eligible to charge to that project, an esub can be written.

Example

The following Bourne shell esub validates job submission parameters:

#!/bin/sh

. $LSB_SUB_PARM_FILE

# Redirect stdout to stderr so echo can be used for error messages 
exec 1>&2

# Check valid projects
if [ $LSB_SUB_PROJECT_NAME != "proj1" -o $LSB_SUB_PROJECT_NAME != "proj2" ]; then
   echo "Incorrect project name specified"
   exit $LSB_SUB_ABORT_VALUE
fi

USER=`whoami`
if [ $LSB_SUB_PROJECT_NAME = "proj1" ]; then
   # Only user1 and user2 can charge to proj1
   if [$USER != "user1" -a $USER != "user2" ]; then
      echo "You are not allowed to charge to this project"
      exit $LSB_SUB_ABORT_VALUE
   fi
fi 

Modifying job submission parameters

esub can be used to modify submission parameters and the job environment before the job is actually submitted.

The following example writes modifications to LSB_SUB_MODIFY_FILE for the following parameters:

In the example, user userA can only submit jobs to queue queueA. User userB must use Bourne shell (/bin/sh), and user userC should never be able to submit a job.

#!/bin/sh
. $LSB_SUB_PARM_FILE

# Redirect stderr to stdout so echo can be used for error messages 
exec 1>&2

USER=`whoami`
# Ensure userA is using the right queue queueA
if [ $USER="userA" -a $LSB_SUB_QUEUE != "queueA" ]; then
   echo "userA has submitted a job to an incorrect queue"
   echo "...submitting to queueA"
   echo 'LSB_SUB_QUEUE="queueA"' > $LSB_SUB_MODIFY_FILE
fi

# Ensure userB is using the right shell (/bin/sh)
if [ $USER="userB" -a $SHELL != "/bin/sh" ]; then
   echo "userB has submitted a job using $SHELL"
   echo "...using /bin/sh instead"
   echo 'SHELL="/bin/sh"' > $LSB_SUB_MODIFY_ENVFILE
fi

# Deny userC the ability to submit a job
if [ $USER="userC" ]; then
   echo "You are not permitted to submit a job."
   exit $LSB_SUB_ABORT_VALUE
fi 

Using bmod and brestart commands with mesub

You can use the bmod command to modify job submission parameters, and brestart to restart checkpointed jobs. Like bsub, bmod and brestart also call mesub, which in turn invoke any existing esub executables in LSF_SERVERDIR. bmod and brestart cannot make changes to the job environment through mesub and esub. Environment changes only occur when mesub is called by the original job submission with bsub.

Use multiple esub (mesub)

LSF provides a master esub (LSF_SERVERDIR/mesub) to handle the invocation of individual application-specific esub executables and the job submission requirements of your applications.

  1. Use the -a option of bsub to specify the application you are running through LSF.
  2. For example, to submit a FLUENT job:

    bsub -a fluent bsub_options fluent_command

    The method name fluent, uses the esub for FLUENT jobs (LSF_SERVERDIR/esub.fluent), which sets the checkpointing method LSB_ECHKPNT_METHOD="fluent" to use the echkpnt.fluent and erestart.fluent.

LSB_ESUB_METHOD (lsf.conf)

To specify a mandatory esub method that applies to all job submissions, you can configure LSB_ESUB_METHOD in lsf.conf.

LSB_ESUB_METHOD specifies the name of the esub method used in addition to any methods specified in the bsub -a option.

For example, LSB_ESUB_METHOD="dce fluent" defines DCE as the mandatory security system, and FLUENT as the mandatory application used on all jobs.

Compatibility note
restriction:  
After LSF version 5.1, the value of -a and LSB_ESUB_METHOD must correspond to an actual esub file in LSF_SERVERDIR. For example, to use bsub -a fluent, the file esub.fluent must exist in LSF_SERVERDIR.

How master esub invokes application-specific esubs

bsub invokes mesub at job submission, which calls esub programs in this order:

  1. Mandatory esub programs defined by LSB_ESUB_METHOD
  2. Any existing executable named LSF_SERVERDIR/esub
  3. Application-specific esub programs in the order specified in the bsub -a option
Example

In this example:

Configure master esub and your application-specific esub

The master esub is installed as LSF_SERVERDIR/mesub. After installation:

  1. Create your own application-specific esub.
  2. Optional. Configure LSB_ESUB_METHOD in lsf.conf to specify a mandatory esub for all job submissions.
Name your esub
  1. Use the following naming conventions:

Existing esub

Your existing esub does not need to follow this convention and does not need to be renamed. However, since mesub invokes any esub that follows this convention, you should move any backup copies of your esubs out of LSF_SERVERDIR or choose a name that does not follow the convention (for example, use esub_bak instead of esub.bak).

Working with eexec

About eexec

The eexec program runs on the execution host at job start-up and completion time and when checkpointing is initiated. It is run as the user after the job environment variables have been set. The environment variable LS_EXEC_T is set to START, END, and CHKPNT, respectively, to indicate when eexec is invoked.

If you need to run eexec as a different user, such as root, you must properly define LSF_EEXEC_USER in the file /etc/lsf.sudoers. See the Platform LSF Configuration Reference for information about the lsf.sudoers file.

eexec is expected to finish running because the parent job process waits for eexec to finish running before proceeding. The environment variable LS_JOBPID stores the process ID of the process that invoked eexec. If eexec is intended to monitor the execution of the job, eexec must fork a child and then have the parent eexec process exit. The eexec child should periodically test that the job process is still alive using the LS_JOBPID variable.

Using esub and eexec to pass data to execution environments

If esub needs to pass some data to eexec, it can write the data to its standard output for eexec to read from its standard input (stdin). LSF effectively acts as the pipe between esub and eexec (e.g., esub | eexec).

Standard output (stdout) from any esub is automatically sent to eexec.

Limitation

Since eexec cannot handle more than one standard output stream, only one esub can use standard output to generate data as standard input to eexec.

For example, the esub for AFS (esub.afs) sends its authentication tokens as standard output to eexec. If you use AFS, no other esub can use standard output.


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index