Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


This chapter gives an overview of the LSF system architecture and the load sharing services provided by the Platform LSF Application Programming Interfaces (API). It introduces LSF components and describes their interaction. This introduction also demonstrates how to write, compile, and link a simple load sharing application using Platform LSF.


[ Top ]

About this Guide

Last update

May 12 2008

Latest version


Purpose of this guide

This guide is an introduction to using Application Programming Interfaces (APIs) provided by the Platform LSF® ("LSF"). It covers the following topics:

Who should use this guide

You are an application developer and want to learn how to write applications that take advantage of LSF's distributed resource management functionality in your own workload sharing applications.

What you should know

You should be experienced with basic C and C++ programming concepts.

You should know basic LSF concepts such as clusters, jobs, resources, servers, and hosts.

See Administering Platform LSF for information about fundamental LSF concepts.

How this guide is organized

This guide takes you through the process of learning to use the LSF API for application development. Each chapter is devoted to an aspect of LSF API. You should read each chapter in order as the understanding of each concept is built upon the preceding concept.

Chapter 2

Programming with LSLIB provides simple examples that demonstrate how to use LSF Base library (LSLIB) functions in an application. The function prototypes, as well as data structures that are used by the functions, are described. Many of the examples resemble the implementation of existing LSF utilities.

Chapter 3

Programming with LSBLIB shows how to use LSF Batch library (LSBLIB) to access the services provided by LSF Batch and other LSF products. Since LSF Batch is built on top of LSF Base, LSBLIB relies on services provided by LSLIB. However, you only need to link your program with LSBLIB to use LSBLIB functions because the header file of LSBLIB (lsbatch.h) already includes the LSLIB (lsf.h). All other LSF products (such as Platform Parallel and Platform Make) relies on services provided by LSBLIB.

Chapter 4

Advanced Programming Topics investigates more advanced topics in LSF application programming.

Chapter 5

User-Level Checkpointing investigates more advanced topics in LSF application programming.

Chapter 6

Writing an External Scheduler Plugin describes how to use the LSF scheduler plugin API to customize existing scheduling policies or implement new ones that can operate with existing LSF scheduler plugin modules.

Appendix A

Tutorials descrbes a first example program.

Appendix B

Common LSF Functions contains some examples of LSF Batch API.

[ Top ]

Platform LSF Architecture

Platform LSF is a layer of software services on top of UNIX and Windows operating systems. Platform LSF creates a single system image on a network of different computer systems so all the computing resources on a network can be managed and used. Throughout the LSF Programmer's Guide, Platform LSF refers to the Platform LSF suite, which contains the following products:

LSF Base

LSF Base provides basic load-sharing services to a network of different computer systems. All LSF products use LSF Base. Some of the services it provides are:

To provide services, LSF Base includes:

LSF Batch

The services provided by LSF Batch are extensions of the LSF Base services. LSF Batch makes a computer network a network batch computer. It has all the features of a mainframe batch job processing system while doing load balancing and policy-driven resource allocation control.

LSF Batch relies on services provided by LSF Base. LSF Batch uses:

LSF Batch includes a master match daemon (mbatchd) running on the master host and a slave batch daemon (sbatchd) running on each batch server host.

LSF libraries

Platform LSF consists of a number of servers running as root on each participating host in an Platform LSF cluster and a comprehensive set of utilities built on top of the Platform LSF API. The Platform LSF API consist of two libraries:

LSF base system

The diagram below shows the components of the Platform LSF Base and their relationship:

LSF Base consists of the Platform LSF base library (LSLIB) and two servers daemons, the Load Information Manager (LIM) and the Remote Execution Server (RES).


The LSF API LSLIB is the direct user interface to the LSF Base system. Platform LSF APIs provide easy access to the services of Platform LSF servers. An Platform LSF server host runs load-shared jobs. A LIM and a RES run on every Platform LSF server host. They interface with the host's operating system to give users a uniform, host- independent environment.


A cluster is a collection of hosts running LSF. A LIM on one of the hosts in a cluster acts as the master LIM for the cluster. The master LIM is chosen among all the LIMs running in the cluster based on configuration file settings. If the master LIM becomes unavailable, the LIM on the next configured host will automatically become the new master LIM.


The LIM on each host monitors its host's load and reports load information to the master LIM. The master LIM collects information from all hosts and provides that information to the applications.


The RES on each server host accepts remote execution requests and provides fast, transparent, and secure remote execution of tasks.

Application and Platform LSF base interactions

The following diagram shows how an application interacts with Platform LSF Base. All of the transactions take place transparently to the programmer:

LSF Base executes tasks by sending user requests between the submission, master, and execution hosts. From the submission host send a task into the LSF Base system. The master host determines the best execution host to run the task. The execution host runs the task.

  1. lsrun submits a task to LSF for execution.
  2. The submitted task proceeds through the Platform LSF base library (LSLIB).
  3. The LIM communicates the task's information to the cluster's master LIM. Periodically, the LIM on individual machines gathers its 12 built-in load indices and forwards this information to the master LIM.
  4. The master LIM determines the best host to run the task and sends this information back to the submission host's LIM.
  5. Information about the chosen execution host is passed through the LSF base library.
  6. Information about the host to execute the task is passed back to lsrun.

  1. lsrun creates NIOS (network input output server) which is the communication pipe that talks to the RES on the execution host.
  2. Task execution information is passed from the NIOS to the RES on the execution host.
  3. The RES creates a child RES and passes the task execution information to the child RES.
  4. The child RES creates the execution environment and runs the task.
  5. The child RES receives completed task information.
  6. The child RES sends the completed task information to the RES.
  7. The output is sent from the RES to the NIOS. The child RES and the execution environment is destroyed by the RES.
  8. The NIOS sends the output to standard out

To run a task remotely or to perform a file operation remotely, an application calls the remote execution or remote file operation service functions in LSLIB, which then contact the RES to get the services.

The same NIOS is shared by all remote tasks running on different hosts started by the same instance of LSLIB. The LSLIB contacts multiple Remote Execution Servers (RES) and they all call back to the same NIOS. The sharing of the NIOS is restricted to within the same application.

Remotely executed tasks behave as if they were executing locally. The local execution environment passed to the RES is re-established on the remote host, and the task's status and resource usage are passed back to the client. Terminal I/O is transparent, so even applications such as vi that do complicated terminal manipulation run transparently on remote hosts. UNIX signals are supported across machines, so remote tasks get signals as if they were running locally. Job control also is done transparently. This level of transparency is maintained between heterogeneous hosts.

LSF batch system

LSF Batch is a layered distributed load sharing batch system built on top of Platform LSF Base. The services provided by LSF Batch are extensions to the Platform LSF Base services. Application programmers can access batch services through the LSF Batch Library (LSBLIB). The diagram below shows the components of LSF Batch and their relationship:

LSF Batch accepts user jobs and holds them in queues until suitable hosts are available. LSF Batch runs user jobs on LSF Batch execution hosts, those hosts that a site deems suitable for running batch jobs.

LSBLIB consists of LSF API, the direct user interface to the rest of the LSF Batch system. Platform LSF APIs provide easy access to the services of Platform LSF servers. The API routines hide the interaction details between the application and Platform LSF servers in a way that is platform independent.

LSF Batch services are provided by two daemons, one mbatchd (master batch daemon) running in each Platform LSF cluster, and one sbatchd (slave batch daemon) running on each batch server host.

Application and Platform LSF batch interactions

LSF Batch operation relies on the services provided by Platform LSF Base. LSF Batch contacts the master LIM to get load and resource information about every batch server host. The diagram below shows the typical operation of LSF Batch:

LSF Batch executes jobs by sending user requests from the submission host to the master host. The master host puts the job in a queue and dispatches the job to an execution host. The job is run and the results are emailed to the user.

Unlike LSF Base, the submission host does not directly interact with the execution host.

  1. bsub or lsb_submit()submits a job to LSF for execution.
  2. To access LSF base services, the submitted job proceeds through the Platform LSF Batch library (LSBLIB) that contains LSF Base library information.
  3. The LIM communicates the job's information to the cluster's master LIM. Periodically, the LIM on individual machines gathers its 12 built-in load indices and forwards this information to the master LIM.
  4. The master LIM determines the best host to run the job and sends this information back to the submission host's LIM.
  5. Information about the chosen execution host is passed through the LSF Batch library.
  6. Information about the host to execute the job is passed back to bsub or lsb_submit().
  7. To enter the batch system, bsub or lsb_submit()sends the job to LSBLIB.
  8. Using LSBLIB services, the job is sent to the mbatchd running on the cluster's master host.
  9. The mbatchd puts the job in an appropriate queue and waits for the appropriate time to dispatch the job. User jobs are held in batch queues by mbatchd, which checks the load information on all candidate hosts periodically.

  1. The mbatchd dispatches the job when an execution host with the necessary resources becomes available where it is received by the host's sbatchd. When more than one host is available, the best host is chosen.
  2. Once a job is sent to an sbatchd, that sbatchd controls the execution of the job and reports the job's status to mbatchd. The sbatchd creates a child sbatchd to handle job execution.
  3. The child sbatchd sends the job to the RES.
  4. The RES creates the execution environment to run the job.
  5. The job is run in the execution environment.
  6. The results of the job are sent to the email system.
  7. The email system sends the job's results to the user.

The mbatchd always runs on the host where the master LIM runs. The sbatchd on the master host automatically starts the mbatchd. If the master LIM moves to a different host, the current mbatchd will automatically resign and a new mbatchd will be automatically started on the new master host.

The log files store important system and job information so that a newly started mbatchd can restore the status of the previous mbatchd. The log files also provide historic information about jobs, queues, hosts, and LSF Batch servers.

[ Top ]

Platform LSF API Services

Platform LSF services are natural extensions of operating system services. Platform LSF services glue heterogeneous operating systems into a single, integrated computing system.

Platform LSF APIs provide easy access to the services of Platform LSF servers.

Platform LSF APIs have been used to build numerous load sharing applications and utilities. Some examples of applications built on top of the Platform LSF APIs are lsmake, lstcsh, lsrun, and the LSF Batch user interface.

Platform LSF base API services

The Platform LSF Base API (LSLIB) allows application programmers to get services provided by LIM and RES. The services include:

Configuration information service

This set of function calls provide information about the Platform LSF cluster configuration, such as hosts belonging to the cluster, total amount of installed resources on each host (e.g., number of CPUs, amount of physical memory, and swap space), special resources associated with individual hosts, and types and models of individual hosts.

Such information is static and is collected by LIMs on individual hosts. By calling these routines, an application gets a global view of the distributed system. This information can be used for various purposes. For example, the Platform LSF command lshosts displays such information on the screen. LSF Batch also uses such information to know how many CPUs are on each host.

Flexible options are available for an application to select the information that is of interest to it.

Dynamic load information service

This set of function calls provide comprehensive dynamic load information collected from individual hosts periodically. The load information is provided in the form of load indices detailing the load on various resources of each host, such as CPU, memory, I/O, disk space, and interactive activities. Since a site-installed External LIM (ELIM) can be optionally plugged into the LIM to collect additional information that is not already collected by the LIM, this set of services can be used to collect virtually any type of dynamic information about individual hosts.

Example applications that use such information include lsload and lsmon. This information is also valuable to an application making intelligent job scheduling decisions. For example, LSF Batch uses such information to decide whether or not a job should be sent to a host for execution.

These service routines provide powerful mechanism for selecting the information that is of interest to the application.

Placement advice service

Platform LSF Base API provides functions to select the best host among all the hosts. The selected host can then be used to run a job or to login. Platform LSF provides flexible syntax for an application to specify the resource requirements or criteria for host selection and sorting.

Many Platform LSF utilities use these functions for placement decisions, such as lsrun, lsmake, and lslogin. It is also possible for an application to get the detailed load information about the candidate hosts together with a preference order of the hosts.

A parallel application can ask for multiple hosts in one LSLIB call for the placement of a multi-component job.

The performance differences between different models of machines as well as the number of CPUs on each host are taken into consideration when placement advice is made, with the goal of selecting qualified hosts that will provide the best performance.

Task list manipulation service

Task lists are used to store default resource requirements for users. Platform LSF provides functions to manipulate the task lists and retrieve resource requirements for a task. This is important for applications that need to automatically pick up the resource requirements from user's task list. The Platform LSF command lsrtasks uses these functions to manipulate user's task list. Platform LSF utilities such as lstcsh, lsrun, and bsub automatically pick up the resource requirements of the submitted command line by calling these LSLIB functions.

Master selection service

If your application needs some kind of fault tolerance, you can make use of the master selection service provided by the LIM. For example, you can run one copy of your application on every host and only allow the copy on the master host to be the primary copy and others to be backup copies. LSLIB provides a function that tells you the name of the current master host.

LSF Batch uses this service to achieve improved availability. As long as one host in the Platform LSF cluster is up, LSF Batch service will continue.

Remote execution service

The remote execution service provides a transparent and efficient mechanism for running sequential as well as parallel jobs on remote hosts. The services are provided by the RES on the remote host in cooperation with the Network I/O Server (NIOS) on the local host. The NIOS is a per application stub process that handles the details of the terminal I/O and signals on the local side. NIOS is always automatically started by the LSLIB as needed.

RES runs as root and runs tasks on behalf of all users in the Platform LSF cluster. Proper authentication is handled by RES before running a user task.

Platform LSF utilities such as lsrun, lsgrun, ch, lsmake, and lstcsh use the remote execution service.

Remote file operation service

The remote file operation service allows load sharing applications to operate on files stored on remote machines. Such services extend the UNIX and Windows file operation services so that files that are not shared among hosts can also be accessed by distributed applications transparently.

LSLIB provides routines that are extensions to the UNIX and Windows file operations such as open(2), close(2), read(2), write(2), fseek(3), stat(2), etc.

The Platform LSF utility lsrcp is implemented with the remote file operation service functions.

Administration service

This set of function calls allow application programmers to write tools for administrating the Platform LSF servers. The operations include reconfiguring the Platform LSF clusters, shutting down a particular Platform LSF server on some host, restarting an Platform LSF server on some host, turning logging on or off, locking/unlocking a LIM on a host, etc.

The lsadmin utility uses the administration services.

LSF Batch API services

The LSF Batch API, LSBLIB, gives application programmers access to the job queueing processing services provided by the LSF Batch servers. All LSF Batch user interface utilities are built on top of LSBLIB. The services that are available through LSBLIB include:

LSF Batch system information service

This set of function calls allow applications to get information about LSF Batch system configuration and status. These include host, queue, and user configurations and status.

The batch configuration information determines the resource sharing policies that dictate the behavior of the LSF Batch scheduling.

The system status information reflects the current status of hosts, queues, and users of the LSF Batch system.

Example utilities that use the LSF Batch configuration information services are bhosts, bqueues, busers, and bparams.

Job manipulation service

The job manipulation service allows LSF Batch application programmers to write utilities that operate on user jobs. The operations include job submission, signaling, status checking, checkpointing, migration, queue switching, and parameter modification.

LSF Batch administration service

This set of function calls are useful for writing LSF Batch administration tools.

The LSF Batch command badmin is implemented with these library calls.

[ Top ]

Getting Started with Platform LSF Programming

Platform LSF programming is like any other system programming. You are assumed to have UNIX and/or Windows operating system and C programming knowledge to understand the concepts involved in this section.

lsf.conf File

This guide frequently refers to the file, lsf.conf, for the definition of some parameters. lsf.conf is a generic reference file containing definitions of directories and parameters. It is by default installed in /etc. If it is not installed in /etc, all users of Platform LSF must set the environment variable LSF_ENVDIR to point to the directory in which lsf.conf is installed. See the Platform LSF Reference for more details about the lsf.conf file.

Platform LSF header files

All Platform LSF header files are installed in the directory LSF_INCLUDEDIR/lsf, where LSF_INCLUDEDIR is defined in the file lsf.conf. You should include LSF_INCLUDEDIR in the include file search path, such as that specified by the `-Idir' option of some compilers or pre-processors.

There is one header file for LSLIB, the Platform LSF Base API, and one header file for LSBLIB, the LSF Batch API.


An Platform LSF application must include <lsf/lsf.h> before any of the Platform LSF Base API services are called. lsf.h contains definitions of constants, data structures, error codes, LSLIB function prototypes, macros, etc., that are used by all Platform LSF applications.


An LSF Batch application must include <lsf/lsbatch.h> before any of the LSF Batch API services are called. lsbatch.h contains definitions of constants, data structures, error codes, LSBLIB function prototypes, macros, etc., that are used by all LSF Batch applications.


There is no need to explicitly include <lsf/lsf.h> in an LSF Batch application because lsbatch.h includes <lsf/lsf.h>.

Linking applications with Platform LSF APIs

For all UNIX platforms, Platform LSF API functions are contained in two libraries, liblsf.a (LSLIB) and libbat.a (LSBLIB). For Windows, the file names of these libraries are: liblsf.lib (LSLIB) and libbat.lib (LSBLIB). These files are installed in LSF_LIBDIR, where LSF_LIBDIR is defined in the file lsf.conf.


LSBLIB is not independent. It must always be linked together with LSLIB because LSBLIB services are built on top of LSLIB services.

Platform LSF uses BSD sockets for communication across a network. On systems that have both System V and BSD programming interfaces, LSLIB and LSBLIB typically use the BSD programming interface. On System V-based versions of UNIX such as Solaris, it is necessary to link applications using LSLIB or LSBLIB with the BSD compatibility library. On Windows, a number of libraries need to be linked together with LSF API. Details of these additional linkage specifications (libraries and link flags) are shown in the table below.

Additional Linkage Specifications by Platform

Platform Additional Linkage Specifications
Digital UNIX
-lmach -lmld
-lsun -lc_s
SunOS 4
Solaris 2
-lnsl -lelf -lsocket -lrpcsvc -lgen -ldl
Solaris 7 32-bit
-lnsl -lefl -lsocket -lrpcsvc -lgen -ldl -DSVR4
-lresolv -lm

Solaris 7 64-bit
-lnsl -lefl -lsocket -lrpcsvc -lgen -ldl -Xarch=v9
-lresolv -lm

-lnsl -lelf -lsocket -lrpcsvc -lgen
Sony NEWSs
-lc -lnsl -lelf -lsocket -lrpcsvc -lgen -lucb
Cray Unicos
Windows 2000
-MT -DWIN32 libcmt.lib oldnames.lib kernel32.lib advapi32.lib user32.lib wsock32.lib mpr.lib netapi32.lib userenv.lib oleaut32.lib uuid.lib activeds.lib adsiid.lib ole32.lib liblsf.lib libbat.lib
Windows XP
-MT -DWIN32 libcmt.lib oldnames.lib kernel32.lib advapi32.lib user32.lib wsock32.lib mpr.lib netapi32.lib userenv.lib oleaut32.lib uuid.lib activeds.lib adsiid.lib ole32.lib liblsf.lib libbat.lib


On Windows, you need to add paths specified by LSF_LIBDIR and LSF_INCLUDEDIR in lsf.conf to the environment variables LIB and INCLUDE.

Recall that the GNU C compiler on Solaris only supports 32 bit application development (not 64 bit). Link your 32 bit applications on Solaris with the 32 bit LSF sparc-sol7-32 distribution file.

The $LSF_MISC/examples directory contains a makefile for making all the example programs in that directory. You can modify this file and the example programs for your own use.

All LSLIB function call names start with ls_.

All LSBLIB function call names start with lsb_.

Example LSF API program compilation

Without a makefile

To compile an LSF API program without using the makefile, include the LSF API libraries and the link flags respective to the appropriate architecture on the command line to establish the compilation environment. For example, to compile an LSF API program on a Solaris 2.x machine, you will have a compilation statement similar to the following:

% cc -o simbhsots simbhosts.c -I/usr/local/mnt/clusterA/include 
/usr/local/clusterA/lib/libbat.a /usr/local/clusterA/lib/liblsf.a -lnsl -lelf 
-lsocket -lrpcsvc -lgen -ldl -lresolv -lm

On Linux

To compile an LSF API program on Linux, include the libnsl.a library. This library is located in /usr/lib/. For example, when compiling a program on redhat6.2-intel, use the following:

gcc program.c -I/usr/share/lsf/mnt/include /usr/share/lsf/lib/libbat.a 
/usr/share/lsf/lib/liblsf.a /usr/lib/libnsl.a

where program.c is the name of the program you want to compile.

On Solaris x86-64-sol10

To compile an LSF API program on Solaris x86-64-sol10, use the following:

/opt/SUNWspro/bin/cc obj.c -R/usr/dt/lib:/usr/openwin/lib -DSVR4 -DSOLARIS 
-DSOLARIS64 -xs -xarch=amd64 -D_TS_ERRNO -Dx86_64 -DSOLARIS2_5 -DSOLARIS2_7 
-lelf -lsocket -lrpcsvc -lgen -ldl -lresolv -o obj_name

where obj.c is the name of the program you want to compile and obj_name is the name of the binary you can run after compiling the program.

Error handling

Platform LSF API uses error numbers to indicate an error. There are two global variables that are accessible from the application. These variables are used in exactly the same way UNIX system call error number variable errno is used. The error number should only be tested when an LSLIB or LSBLIB call fails.


An Platform LSF program should test whether an LSLIB call is successful or not by checking the return value of the call instead of lserrno.

When any LSLIB function call fails, it sets the global variable lserrno to indicate the cause of the error. The programmer can either call ls_perror() to print the error message explicitly to the stderr, or call ls_sysmsg() to get the error message string corresponding to the current value of lserrno.

Possible values of lserrno are defined in lsf.h.


This variable is very similar to lserrno except that it is set by LSBLIB whenever an LSBLIB call fails. Programmers can either call lsb_perror() to find out why an LSBLIB call failed or use lsb_sysmsg() to get the error message corresponding to the current value of lsberrno.

Possible values of lsberrno are defined in lsbatch.h.


lserrno should be checked only if an LSLIB call fails. If an LSBLIB call fails, then lsberrno should be checked .

[ Top ]

Example Applications

Example application using LSLIB

#include <stdio.h>
#include <lsf/lsf.h>

void main()
    char *clustername;

    clustername = ls_getclustername();
    if (clustername == NULL) {

    printf("My cluster name is: <%s>\n", clustername);

This simple example gets the name of the Platform LSF cluster and prints it on the screen. The LSLIB function call ls_getclustername() returns the name of the local cluster. If this call fails, it returns a NULL pointer. ls_perror() prints the error message corresponding to the most recently failed LSLIB function call.

Example output

The above program would produce output similar to the following:

% a.out
My cluster name is: <test_cluster>

Example application using LSBLIB

#include <stdio.h>

    struct parameterInfo *parameters;

    if (lsb_init(NULL) < 0) {

    parameters = lsb_parameterinfo(NULL, NULL, NULL);
    if (parameters == NULL) {

    /* Got parameters from mbatchd successfully. Now print out 
    the fields */
    printf("Job acceptance interval: every %d dispatch 
    /* Code that prints other parameters goes here */
        /* ... */

This example gets the LSF Batch parameters and prints them on the screen. The function lsb_init() must be called before any other LSBLIB function is called.

The data structure parameterInfo is defined in lsbatch.h.

[ Top ]


Platform LSF programming is distributed programming. Since Platform LSF services are provided network-wide, it is important for Platform LSF to deliver the service without compromising the system security.

Platform LSF supports several user authentication protocols. Support for these protocols are described in Administering Platform LSF. Your Platform LSF administrator can configure the Platform LSF cluster to use any of the supported protocols.


Only those Platform LSF API function calls that operate on user jobs, user data, or Platform LSF servers require authentication. Function calls that return information about the system do not need to be authenticated.

The most commonly used authentication protocol, the privileged port protocol, requires that load sharing applications be installed as setuid programs. This means that your application has to be owned by root with the suid bit set.

If you need to frequently change and re-link your applications with Platform LSF API, you can consider using the ident protocol which does not require applications to be setuid programs.

[ Top ]

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]

      Date Modified: May 12, 2008
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2008 Platform Computing Corporation. All rights reserved.