Release Notes for Platform™ LSF™ Version 7

Release date: December 2006

Last modified: January 31, 2008

Comments to: doc@platform.com

Support: support@platform.com


Contents


What's New in Platform LSF Version 7

For more information

For more details about what's new in Platform LSF Version 7, visit the Platform Computing Web site to see Features, Benefits & What's New.

Performance, scalability, reliability, usability enhancements

Support for high job submission rates-LSF now supports higher job submission rates for clusters that include 5000 dedicated hosts (10K dedicated processors/Slots, 20K cores for dual-core processors):

Faster time for reconfiguration and failover-You can now configure LSF to obtain host status more quickly, which allows LSF to reschedule jobs within a shorter time.

Optional EGO management of LSF daemons-including parallel/asynchronous startup/shutdown.

Support for IPv6 address formats-IP addresses can have either a dotted quad notation (IPv4) or IP Next Generation (IPv6) format. LSF supports both formats in mixed IPv4/IPv6 clusters.

LSF on Platform EGO

LSF on Platform EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid.

Scheduling enhancements

Windows enhancements

LSF reports built on EGO

Miscellaneous features


Upgrade and Compatibility Notes

Server host compatibility Platform LSF

important:  
To use new features introduced in Platform LSF Version 7, you must upgrade all hosts in your cluster to LSF 7.

LSF 6.x and 5.x servers are compatible with LSF Version 7 master hosts. All LSF 6.x and 5.x features are supported by Version 7 master hosts.

LSF system support for IPv6

Platform LSF Version 7 is now built with IPv6 support. The following operating systems support IPv4 only:

Upgrading HP-UX 11.11 hosts

Platform LSF Version 7 is now built with IPv6 support and requires the following patch files with IPv6 support for HP-UX 11i v1.0 (11.11 - via TOUR) before upgrading to LSF 7:

Upgrade LSF on UNIX and Linux

Run lsfinstall to upgrade to LSF Version 7 from an earlier version of LSF on UNIX and Linux. Follow the steps in Upgrading Platform LSF on UNIX and Linux.

Migrate LSF on Windows

To migrate an LSF on Windows to LSF Version 7 from an earlier version of LSF on Windows, follow the steps in "Migrate Your Windows Cluster to Platform LSF Version 7" (lsf_migrate_windows.pdf).

Maintenance pack and enhancement update availability

At release, Platform LSF Version 7 includes all bug fixes and solutions up to and including the all the bug fixes before February 5, 2007. Fixes after February 2007 will be included in the next LSF enhancement update.

Fixes in the November 2006 Maintenance Pack are included in the March 2007 enhancement update.

As of February 2007, monthly maintenance packs are no longer distributed for LSF Version 7.

System requirements

See the Platform Computing Web site for information about supported operating systems and system requirements for the Platform LSF family of products:

API compatibility

Full backward compatibility: your applications will run under LSF Version 7 without changing any code.

The Platform LSF Version 7 API is fully compatible with the LSF Version 6.x. and 5.x APIs. An application linked with the LSF Version 6.x or 5.x libraries will run under LSF Version 7 without relinking.

To take full advantage of new Platform LSF Version 7 features, including job submission using JSDL and IPv6 address formats, you should recompile your existing LSF applications with LSF Version 7.

New and changed LSF APIs

See the LSF API Reference for more information.

The following new APIs have been added for LSF Version 7:

Automatic parameter migration during upgrade

Since LIM now belongs to EGO, some existing LSF parameters have corresponding EGO parameters name in ego.conf (LSF_CONFDIR/lsf.conf is a separate file from EGO_CONFDIR/ego.conf).

The following table summarizes the LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in lsf.conf.

If any of the following LSF parameters are already defined in lsf.conf, they are automatically copied during upgrade to the corresponding EGO parameters in ego.conf. The original LSF settings are maintained for backward compatibility:

lsf.conf parameter
ego.conf parameter
LSF_API_CONNTIMEOUT
EGO_LIM_CONNTIMEOUT
LSF_API_RECVTIMEOUT
EGO_LIM_RECVTIMEOUT
LSF_CLUSTER_ID (Windows)
EGO_CLUSTER_ID (Windows)
LSF_CONF_RETRY_INT
EGO_CONF_RETRY_INT
LSF_CONF_RETRY_MAX
EGO_CONF_RETRY_MAX
LSF_DEBUG_LIM
EGO_DEBUG_LIM
LSF_DHPC_ENV
EGO_DHPC_ENV
LSF_DYNAMIC_HOST_TIMEOUT
EGO_DYNAMIC_HOST_TIMEOUT
LSF_DYNAMIC_HOST_WAIT_TIME
EGO_DYNAMIC_HOST_WAIT_TIME
LSF_ENABLE_DUALCORE
EGO_ENABLE_DUALCORE
LSF_GET_CONF
EGO_GET_CONF
LSF_GETCONF_MAX
EGO_GETCONF_MAX
LSF_LIM_DEBUG
EGO_LIM_DEBUG
LSF_LIM_PORT
EGO_LIM_PORT
LSF_LOCAL_RESOURCES
EGO_LOCAL_RESOURCES
LSF_LOG_MASK
EGO_LOG_MASK
LSF_MASTER_LIST
EGO_MASTER_LIST
LSF_PIM_INFODIR
EGO_PIM_INFODIR
LSF_PIM_SLEEPTIME
EGO_PIM_SLEEPTIME
LSF_PIM_SLEEPTIME_UPDATE
EGO_PIM_SLEEPTIME_UPDATE
LSF_RSH
EGO_RSH
LSF_STRIP_DOMAIN
EGO_STRIP_DOMAIN
LSF_TIME_LIM
EGO_TIME_LIM

How to handle parameters in lsf.conf with corresponding parameters in ego.conf

Existing LSF parameters (parameter names beginning with LSB_ or LSF_) that are set only in lsf.conf operate as usual because LSF daemons and commands read both lsf.conf and ego.conf. You can keep your existing LSF parameters in lsf.conf.

You cannot set LSF parameters (parameter names beginning with LSF_ or LSB_) in ego.conf, and you cannot set EGO parameters (parameter names beginning with EGO_) in lsf.conf.

note:  
A parameter in lsf.conf does not necessarily have exactly the same behavior, valid values, syntax, or default value as the corresponding parameter in ego.conf, so in general, you should not set them in both files. If you need LSF parameters for backwards compatibility, you should set them only in lsf.conf.
If you specify a parameter in lsf.conf, and you also specify the corresponding parameter in ego.conf, the parameter value in ego.conf takes precedence over the conflicting parameter in lsf.conf. If the parameter is not set in either lsf.conf or ego.conf, the ego.conf default takes effect.

If a parameter is not yet set in lsf.conf and there is a corresponding parameter in ego.conf, you should set the corresponding EGO parameter in ego.conf instead setting the LSF parameter in lsf.conf.

LSF 6.2 hosts in your cluster can only read lsf.conf, so you must set LSF parameters only in lsf.conf, or make sure that the values are the same in both lsf.conf and ego.conf.

Multiple cluster configuration

In Platform LSF Version 7, multiple independent clusters can no longer share the same configuration directory. You must install each LSF cluster in a unique location.


What's Changed in Platform LSF Version 7

Changed behavior

Batch command messages

LSF displays new error messages when a batch command cannot communicate with mbatchd. You can customize three of these messages in order to provide LSF users with more detailed information and instructions. bhosts and bjobs might display different messages when mbatchd is down and the LSB_QUERY_PORT is busy.

The following table lists the parameters in lsf.conf you can use to customize messages when a batch command does not receive a response from mbatchd. For backwards compatibility, you can use these parameters to set the message to the batch daemon not responding...still trying. message text used in previous versions of LSF.

Reason for no response from mbatchd
Default message
Parameter used to customize the message
mbatchd is too busy to accept new connections or respond to client requests
LSF is processing your request. Please wait...
LSB_MBD_BUSY_MSG
internal system connections to mbatchd fail
Cannot connect to LSF. Please wait...
LSB_MBD_CONNECT_FAIL_MSG
mbatchd is down or there is no process listening at either the LSB_MBD_PORT or the LSB_QUERY_PORT
LSF is down. Please wait...
LSB_MBD_DOWN_MSG

Dynamic host management

Dynamic hosts remain in the cluster unless you manually remove them from $EGO_TOP/kernel/work/lim/hostcache.

Only the cluster administrator can modify the hostcache file.

LSF License Scheduler

Directory format for Windows directories

Windows does not support a mapped drive as input during the installation. When you must specify a directory, use UNC path.

Post-execution on SGI cpusets

Post-execution processing on SGI cpusets behave differently from previous releases. If JOB_INCLUDE_POSTPROC=Y is specified in lsb.applications, post-execution processing is not attached to the job cpuset, and Platform LSF does not release the cpuset until post-execution processing has finished.

Banded licensing

The memory limit for S-Class licenses on X86/AMD64/EM64T processors has increased from 8 GB t o16 GB. The other classes of licenses have not changed.

You can use permanent licenses with restrictions in operating system and hardware configurations. These banded licenses have three classes, with the E-class licenses having no restrictions.

Banded licenses now support the following operating systems and hardware configurations:

License type
Supported operating systems
Processor
Physical memory
Physical processors/sockets
B-Class
Linux, Windows, MacOS
Intel X86/AMD64/EM64T
Up to and including 4 GB physical memory on a node
Up to and including 2 processors
S-Class
Linux, Windows, MacOS
Intel X86/AMD64/EM64T
Up to and including 16 GB physical memory on a node
Up to and including 4 processors
E-Class
Linux, Windows, MacOS
Intel X86/AMD64/EM64T
More than 16 GB physical memory on a node
More than 4 processors
All other LSF-supported operating systems
Intel X86/AMD64/EM64T
N/A
N/A
N/A
All other supported processors
N/A
N/A

LSF daemon management

Manage LSF daemons two ways:

important:  
LSF res and sbatchd do not restart automatically if you run lsadmin resshutdown and badmin hshutdown to manually shut them down. You must run lsadmin resstartup and badmin hstartup to restart the daemons after host shutdown.

All LSF commands and tools, including lsadmin and badmin are available under both management models.

Directory structure changes

The installation directory structure has changed. See Installing Platform LSF on UNIX and Linux for the details of the new structure. Depending on which products you have installed and platforms you have selected, your directory structure may vary.

New and changed configuration parameters and environment variables

The following new configuration parameters and environment variables are new or changed for LSF Version 7:

ego.cluster

ego.conf

hosts file

install.config

slave.config

lsb.hosts

lsb.modules

lsb.params

lsb.queues

lsb.serviceclasses

lsf.cluster

lsf.conf

lsf.licensescheduler

lsf.shared

lsf.sudoers

When LSF daemon control through EGO Service Controller is configured users must have EGO credentials for EGO to start res and sbatchd services. By default, lsadmin and badmin invoke the egosh user logon command to prompt for the user name and password of the EGO administrator to get EGO credentials.

Use the following parameters to bypass EGO logon to start res and sbatchd automatically:

To configure LSF daemon control through EGO at installation, set EGO_DAEMON_CONTROL="Y" in install.config.

Environment variables

Environment variables related to file names and job spooling directories support paths that contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows.

Environment variables related to command names and job names can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows.

The following environment variables are new in LSF Version 7:

The following environment variables have changed in LSF Version 7:

New and changed commands, options, and output

The following command options and output are new or changed for LSF Version 7:

bapp (new)

Displays information about application profiles configured in lsb.applications.

By default, returns the following information about all application profiles: application name, job slot statistics, and job state statistics.

In MultiCluster, returns the information about all application profiles in the local cluster.

Application profile names and attributes are set up by the LSF administrator.

By default, the limit for CORELIMIT, MEMLIMIT, STACKLIMIT, and SWAPLIMIT display is shown in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).

bacct

bclusters

-app displays available application profiles in remote clusters. Application profile configuration information is displayed under the heading Remote Cluster Application Information. Application profile information is only displayed for the job forwarding model. bclusters does not show local cluster application profile information.

bhist

bhosts

bjgroup

bjgroup displays the name of the service class that the job group is attached to with bgadd -sla service_class_name.

bjobs

bkill

bladmin

ckconfig [-v]-Checks LSF License Scheduler configuration in lsf.licensescheduler and lsf.conf. By default, bladmin ckconfig displays only the result of the configuration file check. If warning errors are found, bladmin prompts you to display detailed messages. The -v (verbose mode) option displays detailed messages about configuration file checking to stderr.

blparams (new)

Displays information about configurable LSF License Scheduler parameters defined in lsf.conf and lsf.licensescheduler.

blplugins (new)

Displays plugin activity and the check-in, check-out, and deny counters as seen by the License Scheduler for each feature and service domain.

bmgroup

When hosts are allocated to an EGO-enabled SLA, they are dynamically added to a host group created by the SLA. When the host is released to EGO, the entry is removed from the host group. bmgroup displays the hosts allocated by EGO to the host group created by the SLA.

bmod

bparams

bparams -l displays the values of the following new parameters, if they are defined in lsb.params.

bpeek

bqueues

brequeue

When JOB_INCLUDE_POSTPROC=Y is set in an application profile in lsb.applications, job requeue will happen only after post-execution processing, not when the job finishes.

bsla

The bsla command displays the new keywords

If the SLA is under reclaim, additional keywords are displayed:

bsub

lshosts

By default, the amount of maxmem and maxswp is displayed in KB. The amount may appear in MB depending on the actual system memory or swap space. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (GB, TB, PB, or EB).

lsload

By default, the amount of mem and swp is displayed in KB. The amount may appear in MB depending on the actual system memory or swap space. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (GB, TB, PB, or EB).

lspasswd

You can now run lspasswd on Windows in a non-shared file system environment. You must define the parameter LSF_MASTER_LIST in lsf.conf so that jobs will run with the correct permissions. If this parameter is not defined, LSF assumes that the cluster uses a shared file system environment. lspasswd also allows revalidation of credentials.

xlsadmin (obsolete)

xlsadmin is no longer supported.

New and changed files

The following files have been added or changed in Platform LSF Version 7:

lsb.applications

The lsb.applications file defines application profiles, and contains many of the same parameters as lsb.queues. Use application profiles to define common parameters for the same type of jobs, including the execution requirements of the applications, the resources they require, and how they should be run and managed.

This file is optional. Use the DEFAULT_APPLICATION parameter in lsb.params to specify a default application profile for all jobs. LSF does not automatically assign a default application profile.

This file is installed by default in LSB_CONFDIR/cluster_name/configdir.

By default, the limit for CORELIMIT, MEMLIMIT, STACKLIMIT and SWAPLIMIT is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the limit (MB, GB, TB, PB, or EB).

By default, memory (mem) and swap (swp) limits in select[] and rusage[] sections of RES_REQ are specified in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the these limits (GB, TB, PB, or EB).

EGO configuration files for LSF daemon management (res.xml and sbatchd.xml)

The following files are located in EGO_ESRVDIR/esc/conf/services/:

When LSF daemons control through EGO Service Controller is configured, lsadmin uses the reserved EGO service name res to control the LSF res daemon, and badmin uses the reserved EGO service name sbatchd to control the LSF sbatchd daemon.

win_install.config (obsolete)

The win_install.config file is no longer used by the Platform LSF for Windows installation.

Symbolic links to LSF files

tip:  
If your installation uses symbolic links to other files in the directories containing these new files, you must manually create links to these new files.

New and changed accounting and job event fields

lsb.acct

The following fields are new or changed in the JOB_FINISH record:

options3 (%d)

Bit flags for job processing

app (%s)

Application profile name

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)

inFile (%s)

Input file name (up to 4094 characters for UNIX or 255 characters for Windows)

outFile (%s)

output file name (up to 4094 characters for UNIX or 255 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)

jobName (%s)

Job name (up to 4094 characters for UNIX or 255 characters for Windows)

command (%s)

Complete batch job command specified by the user (up to 4094 characters for UNIX or 255 characters for Windows)

commandSpool (%s)

Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)

inFileSpool (%s)

Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)

lsb.events

The following fields are new or changed in the JOB_NEW record:

options3 (%d)

Bit flags for job processing

app (%s)

Application profile name

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)

inFile (%s)

Input file name (up to 4094 characters for UNIX or 255 characters for Windows)

outFile (%s)

Output file name (up to 4094 characters for UNIX or 255 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)

jobName (%s)

Job name (up to 4094 characters for UNIX or 255 characters for Windows)

command (%s)

Job command (up to 4094 characters for UNIX or 255 characters for Windows)

The following fields are new or changed in the JOB_MODIFY2 record:

jobName (%s)

Job name (up to 4094 characters for UNIX or 255 characters for Windows)

options3 (%d)

Bit flags for job processing

app (%s)

Application profile name

delOption3 (%d)

Delete options for the options3 field

inFile (%s)

Input file name (up to 4094 characters for UNIX or 255 characters for Windows)

outFile (%s)

Output file name (up to 4094 characters for UNIX or 255 characters for Windows)

errFile (%s)

Error output file name (up to 4094 characters for UNIX or 255 characters for Windows)

command (%s)

Job command (up to 4094 characters for UNIX or 255 characters for Windows)

inFileSpool (%s)

Spool input file (up to 4094 characters for UNIX or 255 characters for Windows)

commandSpool (%s)

Spool command file (up to 4094 characters for UNIX or 255 characters for Windows)

cwd (%s)

Current working directory (up to 4094 characters for UNIX or 255 characters for Windows)

The following field is new in the JOB_EXECUTE record:

execCwd (%s)

Current working directory job used on execution host (up to 4094 characters for UNIX or 255 characters for Windows)

Bugs fixed since December 2006

The following bugs have been fixed in the March 2007 enhancement update since the November 2006 Maintenance Pack:

79220
Date
2007-02-04
 
Description
Number of started jobs in of a share account shown by bqueues -l does not match the actual one after running badmin reconfig
 
Component
mbatchd
 
Platform
All
 
Impact
Some shell scripts may break

82400
Date
2007-02-02
 
Description
When logging on remotely, and the remote host does not allow logon locally, the authentication will fail for ProcessManager users
 
Component
eauth.userpass.exe eauth_userpass.exe
 
Platform
Windows
 
Impact
Clients cannot log in to jfd and submit jobs

81923
Date
2007-02-01
 
Description
hostsetup exits before completion
 
Component
hostsetup
 
Platform
All
 
Impact
hostsetup does not run correctly

77119
Date
2007-01-30
 
Description
mbdrestart changes RUN_TIME in host partittion fairshare
 
Component
mbatchd
 
Platform
All
 
Impact
User account information and/or user share priority will be wrong

81303
Date
2007-01-24
 
Description
Submitted job does not run because password has become not valid in LSF database
 
Component
lsfint.lib package
 
Platform
Windows
 
Impact
Job does not run

80616
Date
2007-01-21
 
Description
melim does not exit when lim is killed
 
Component
melim
 
Platform
Linux and UNIX
 
Impact
melim stays alive after lim is gone. Multiple melims on the host after start up new lim

81434
Date
2007-01-17
 
Description
When working with a huge events file, bhist is killed because it requests too much memory
 
Component
bhist
 
Platform
AIX
 
Impact
bhist cannot work with a huge events file

79644
Date
2007-01-16
 
Description
bhist on the remote clusters displays a wrong RUNLIMIT in MultCluster
 
Component
bhist
 
Platform
All
 
Impact
Incorrect bhist output

73615
Date
2007-01-16
 
Description
LSF does not shut down properly in between run levels
 
Component
LSF script startup mbatchd
 
Platform
UNIX
 
Impact
LSF will not shut down and restart properly in between run level changes

80169
Date
2007-01-14
 
Description
The startTime recorded in the lsb.acct file for each job in a chunked set is not the start time for the job, but rather the time when the chunk of jobs that job was part is sent to the execution host
 
Component
mbatchd
 
Platform
All
 
Impact
lsb.acct file cannot be used to determine the actual wall clock run time of a job.

80734
Date
2007-01-11
 
Description
The external scheduler option string submitted with bsub -ext is lost after the job is checkpointed and restarted
 
Component
sbatchd bchkpnt brestart
 
Platform
Linux
 
Impact
Without the external schefuler option, LSF cannot work with the external scheduling

61991
Date
2007-01-08
 
Description
Reserved resources are not released when RUN_WINDOWS is closed
 
Component
sbatchd mbatchd
 
Platform
All
 
Impact
Resource usage is affected

80232
Date
2006-12-27
 
Description
When customer runs lsf_daemons start on a non-LSF host, lim dies but res and sbatchd keep running
 
Component
lsf_daemons
 
Platform
UNIX and Linux
 
Impact
LSF daemons run on non-LSF hosts

79346
Date
2006-12-24
 
Description
!U is not replaced in the eadmin.hmail and user cannot receive the notification email
 
Component
eadmin.cmd mbatchd.exe
 
Platform
Windows
 
Impact
Mail cannot be received when customers set LSB_MAILTO=!U

80363
Date
2006-12-22
 
Description
If a parallel task is manually killed, the pam job output display of task exit status might be wrong
 
Component
pam
 
Platform
All
 
Impact
Incorrect information about why and how a parallel job task is terminated

79945
Date
2006-12-20
 
Description
Some environment variables (CWD, CLEARCASE_ROOT) are not set correctly when submitting Clearcase jobs from Windows to UNIX
 
Component
sbatchd
 
Platform
All
 
Impact
Low

80114
Date
2006-12-20
 
Description
mbatchd takes a long time to write lsb.acct file for each job
 
Component
mbatchd
 
Platform
All
 
Impact
Slow client response

75584
Date
2006-12-20
 
Description
pam receives SISEGV after pthread_create() fails
 
Component
pam
 
Platform
Linux
 
Impact
Job will fail and pam will core dump if a large stack limit is set

79427
Date
2006-12-18
 
Description
Threshold is ignored if resource value is exactly the same as threshold
 
Component
mbatchd
 
Platform
All
 
Impact
Job can be dispatched even if the host is closed_Busy because loadSched is equal to the threshold

80166
Date
2006-12-13
 
Description
Without ego_base license, lim cannot start
 
Component
lim
 
Platform
All
 
Impact
Serious

79374
Date
2006-12-13
 
Description
New openmpi mpirun options are not supported - job will fail
 
Component
openmpi_wrapper
 
Platform
Linux
 
Impact
Job cannot run due to wrong parsing of mpirun options

79805
Date
2006-12-11
 
Description
lim wants to check out dual-core license even though the CPU is not dual-core
 
Component
lim
 
Platform
UNIX and Linux
 
Impact
lim wants to check out dual-core license even though the CPU is not dual-core

78757
Date
2006-12-11
 
Description
bjobs must consistently exit with -1 when jobs not found
 
Component
bjobs
 
Platform
All
 
Impact
Scripts calling bjobs may not work

78745
Date
2006-12-08
 
Description
Job name with wildcard does not match all matching jobs if JOB_DEP_LAST_SUB is set
 
Component
bparams mbatchd
 
Platform
All
 
Impact
Some scripts depend on the correct behavior

79842
Date
2006-12-08
 
Description
If LSF_HPC_EXTENSIONS is defined in lsf.conf, rusage report of a parallel job on first execution node is not accurate - one process is double-counted, some do not show up in bjobs
 
Component
pam
 
Platform
All
 
Impact
rusage report is not correct

79777
Date
2006-12-08
 
Description
If LSF_OEM_LICENSE_PATH is not valid, lim cannot get license from license file set in LSF_LICENSE_FILE
 
Component
lim
 
Platform
All
 
Impact
lim cannot work if LSF_OEM_LICENSE_PATH is not valid

78373
Date
2006-12-07
 
Description
When many users are configured in a fairshare tree, mbatchd takes long time to replay 200K jobs
 
Component
mbatchd
 
Platform
All
 
Impact
mbatchd does not respond for a long period of time

79539
Date
2006-12-07
 
Description
Incorrect LIM warning messages for dual core license even though LIM can recognize the dual core license
 
Component
lim
 
Platform
All
 
Impact
Confusing warning messages

79633
Date
2006-12-06
 
Description
sbatchd logs misleading message about unlock hosts
 
Component
sbatchd
 
Platform
All
 
Impact
User cannot see the real reason of the failure

78527
Date
2006-12-06
 
Description
Cluster is unlicensed due to lack of license
 
Component
lim.exe
 
Platform
All Windows platforms (Intel x86/x64/IA64/Single-Core/Dual-Core, AMD x86/X64/Single-Core/Dual-Core)
 
Impact
Cluster unlicensed due to lack of license

78956
Date
2006-12-05
 
Description
When LSF dispatches a large number of jobs in one scheduling cycle, master lim receives lots of host information queries from sbatchd and slows down master host performance
 
Component
sbatchd
 
Platform
All
 
Impact
Master host slows down

76846
Date
2006-12-05
 
Description
After using bkill -r kill on running array jobs, the jobs still keep running
 
Component
mbatchd
 
Platform
UNIX
 
Impact
Jobs cannot be killed completely

79431
Date
2006-12-04
 
Description
lsadmin will core dump when LSF_RSH is defined in lsf.conf
 
Component
badmin lsadmin
 
Platform
All
 
Impact
Medium

79213
Date
2006-12-01
 
Description
lsfinstall does not work if the lsf.shared started with Begin Cluster when the LSF is reinstalled
 
Component
lsfinstall
 
Platform
UNIX and Linux
 
Impact
Cannot reinstall successfully

79069
Date
2006-12-01
 
Description
API cpuset_create() fails with "Invalid arguments", job goes back to pending state
 
Component
rla schmod_cpuset.so
 
Platform
SGI cpuset integration
 
Impact
In the best case, the job gets rescheduled to a different node and it runs. In the worst case, it gets scheduled to the same node and the allocation fails again.


Known Issues

Platform LSF Version 7

SGI cpusets and JOB_INCLUDE_POSTPROC

If you specify JOB_INCLUDE_POSTPROC=Y in an application profile in lsb.applications to enable job post-execution to be included in job finish status reporting, SGI cpusets behave differently from previous releases.

The post-execution processing is not attached to the job cpuset, but Platform LSF does not release the cpuset until post-execution processing has finished.

lsfstartup on Mac OS X

When LSF_EGO_DAEMON_CONTROL="Y" is specified in lsf.conf, running lsfstartup displays incorrect error messages, but the cluster can be started correctly.

When you see the following message

Error(s) found in previous operation, continue? [y/n]y 

choose yes (y) to continue startup.

Platform LSF on Windows

cmd.exe permissions

For jobs that run on a Windows Server 2003, x64 Edition platform, users must have "Read" and "Execute" privileges for cmd.exe.

Post-execution process tracking

JOB_POSTPROC_TIMEOUT configured in an application profile in lsb.applications has no effect on Windows execution hosts because post-execution processing on Windows tracks only the direct parent command. Child processes of the post-execution command remain running.

Platform LSF License Scheduler

Symptoms

With the flexible grid integration plugin enabled, bladmin reconfig has the following problems when reconfiguring License Scheduler after configuration change in lsf.licensescheduler:

Workaround

After changing any configuration in lsf.licensescheduler, run bladmin shutdown to shut down bld. After waiting at least one minute, then run blstartup to restart bld. Do not run bladmin reconfig.

Platform LSF reporting

The default out-of-box configuration for Platform LSF reporting with Oracle database can only support up to 1 million jobs per day. If your data volume is greater than this, contact Platform Support (support@platform.com) for recommended configuration.

In the Service Level Agreement (SLA) report for throughput, the starting point of the Optimal line is inconsistent with the starting point of the time window. In the Service Level Agreement (SLA) report for velocity, the velocity goal line does not cover the last bar in the chart.

Platform LSF License Scheduler reporting

Platform LSF License Scheduler is not supported on Linux IA64 hosts. By default, the reporting data loader for the Platform LSF License Scheduler daemon bld is disabled on Linux IA64 hosts.

EGO-enabled SLA scheduling limitations

Parallel jobs

Resource allocation is based on the number of jobs, not the slots required by the job. EGO-enabled SLA requests resource based on velocity and the number of pending jobs. If a parallel job requires multiple processors, the SLA may request fewer processors than the requirement, which causes the job to remain pending. To avoid this, you can configure larger velocity in the SLA.

MultiCluster

Resource export under the lease model is not guaranteed. With EGO-enabled SLA scheduling, all resources are dynamic, so the exported hosts may not be allocated to LSF.

Advance reservations

EGO-enabled SLA does not support advance reservations. Advanced reservations need to reserve resources for a specified time window, which is not currently supported in EGO.

Job-level resource requirements (bsub  -R)

LSF takes the resource requirement into consideration for scheduling, but if the resource request does not match the resource requirement specified in the service class, the host allocated by EGO cannot match the specified resource requirement, and the job remains pending. LSF treats the allocated host as idle and returns it to EGO. The pending job causes another request to be sent to EGO, which allocates another host, which may or may not satisfy the resource requirement.

Use EGO_RES_REQ=res_req in the service class configuration to specify all job resource requirements.

Job-level host preference (bsub  -m)

Specific job-level host requests are similar to bsub -R (essentially the same as bsub -R "select host_name"). The specified host is not guaranteed to be allocated by EGO. The job remains pending until the specified host actually allocated.

Use EGO_RES_REQ=res_req in the service class configuration to specify all job resource requirements.


Download the Platform LSF Version 7 Distribution Packages

Download the LSF distribution packages through FTP at ftp.platform.com.

important:  
The latest Platform LSF Version 7 release is Update 2. Distribution packages are available only for Platform LSF Version 7 Update 2 and Platform LSF Version 7 Update 1.

Download steps

Prerequisites: Access to the Platform FTP site is controlled by login name and password. If you cannot access the distribution files for download, send email to support@platform.com.

  1. Log on to the LSF file server.
  2. Change to the directory where you want to download the LSF distribution files. Make sure that you have write access to the directory. For example:
  3. # cd /usr/share/lsf/tarfiles 
    
  4. FTP to the Platform FTP site:
  5. # ftp ftp.platform.com 
    
  6. Provide the login user ID and password provided by Platform.
  7. Change to the directory for the LSF Version 7 release:
  8. ftp> cd /distrib/7.0 
    
  9. Set file transfer mode to binary:
  10. ftp> binary 
    
  11. For LSF on UNIX and Linux, get the installation distribution file.
  12. tip:  
    Before installing LSF on your UNIX and Linux hosts, you must uncompress and extract lsf7.0_lsfinstall.tar.Z to the same directory where you download the LSF product distribution tar files.
  13. Get the distribution packages for the products you want to install on the supported platforms you need.
  14. Download the latest Platform LSF Version 7 documentation from /distrib/7.0/docs/.
  15. Download the latest Platform EGO Version 1.2 documentation from /distrib/7.0/docs/.
  16. Optional. Download the Platform Management Console (PMC) distribution package.
  17. note:  
    To take advantage of the Platform LSF reporting feature, you must download and install the Platform Management Console. The reporting feature is only supported on the same platforms as the Platform Management Console: 32-bit and 64-bit x86 Windows and Linux operating systems.
  18. Exit FTP.
  19. ftp> quit 
    

Archive location of previous update releases

Directories containing release notes and distribution files for previous LSF Version 7 update releases are located on the Platform FTP site under /distrib/7.0/archive. Archive directories are named relative to the current update release:


Install Platform LSF Version 7

Installing Platform LSF involves the following steps:

  1. Get a DEMO license (license.dat fie).
  2. Run the installation programs.

Get a Platform LSF demo license

Before installing Platform LSF Version 7, you must get a demo license key.

Contact license@platform.com to get a demo license.

Put the demo license file license.dat in the same directory where you downloaded the Platform LSF product distribution tar files.

Run the UNIX and Linux installation

Use the lsfinstall installation program to install a new LSF Version 7 cluster, or to upgrade from and earlier LSF version.

See Installing Platform LSF on UNIX and Linux for new cluster installation steps.

See the Platform LSF Reference for detailed information about lsfinstall and its options.

Run the Windows installation

Platform LSF on Windows 2000, Windows 2003, and Windows XP is distributed in the following packages:

See Installing Platform LSF on Windows for installation steps.

Install Platform LSF License Scheduler

See Using Platform LSF License Scheduler for installation and configuration steps.

Install Platform LSF HPC

Use lsfinstall to install a new Platform LSF HPC cluster or to upgrade LSF HPC from a previous release.

important:  
Make sure ENABLE_HPC_INST=Y is specified in install.config to enable Platform LSF HPC installation.

See Using Platform LSF HPC for installation and configuration steps.

Special installation steps for the Platform Management Console on Linux IA64

To install the Platform Management Console on Linux IA64 hosts, you must download and install the Linux IA64 version of BEA Jrockit 5.0 JRE.

  1. Download the Linux IA64 version of BEA Jrockit 5.0 JRE.
    1. Open the BEA download page.
    2. http://commerce.bea.com/products/weblogicjrockit/5.0/jr_50.jsp 
      
    3. Save the download file to your local disk.
    4. For JRockit 5.0 R27.1 JRE Linux (Intel Itanium - 64-bit), save the file named jrockit-R27.1.0-jre1.5.0_08-linux-ipf.bin.

    5. Make sure that the .bin file is executable.
    6. chmod +x jrockit-R27.1.0-jre1.5.0_08-linux-ipf.bin 
      
  2. Install the JRE on the Linux IA64 host.
    1. Change to a shared directory where you want to install BEA Jrockit.
    2. Run the installer in console mode.
    3. jrockit-R27.1.0-jre1.5.0_08-linux-ipf.bin -mode=console 
       

      The installation creates a new directory:

      jrockit-R27.1.0-jre1.5.0_08
  3. Follow the steps in Installing Platform LSF on UNIX and Linux to run lsfinstall to install Platform LSF and the Platform Management Console.
  4. Make a symbolic link to the JRE.
  5. For example, if you installed the JRE under /opt/jre:

    cd $EGO_TOP/jre 
    ln -s  /opt/jre/jrockit-R27.1.0-jre1.5.0_08-linux-ipf linux-ia64 
    
  6. Check the symbolic link to the JRE.
  7. If the symbolic link is correct, you should see the contents of the linux-ia64 directory:

    cd $EGO_TOP/jre/linux-ia64 
    ls 
    bin/ lib/ LICENSE license.bea README.TXT 
    

Learn About Platform LSF Version 7

Information about Platform LSF is available from the following sources:

World Wide Web and FTP

Information about Platform LSF Version 7 is available in the LSF Version 7 area of the Platform FTP site (ftp.platform.com/).

The latest information about all supported releases of Platform LSF is available on the Platform Web site at www.platform.com.

If you have problems accessing the Platform web site or the Platform FTP site, send email to support@platform.com.

my.platform.com

my.platform.com-Your one-stop-shop for information, forums, e-support, documentation and release information. my.platform.com provides a single source of information and access to new products and releases from Platform Computing.

On the Platform LSF Family product page of my.platform.com, you can download software, patches, updates and documentation. See what's new in Platform LSF Version 7, check the system requirements for Platform LSF, and browse the latest documentation updates through the Platform LSF Knowledge Center.

Platform LSF documentation

The Platform LSF Knowledge Center is your entry point for LSF documentation. After downloading and extracting the LSF documentation distribution file, browse the file docs/lsf/7.0/index.html to access the documentation.

If you have installed the Platform Management Console, access the Platform LSF documentation through link to the Platform Knowledge Center.

Platform EGO documentation

The Platform EGO Knowledge Center is your entry point for Platform EGO documentation. It is installed when you install LSF. To access the EGO documentation, browse the file EGO_TOP/docs/ego/1.2/index.html.

If you have installed the Platform Management Console, access the Platform EGO documentation through link to the Platform Knowledge Center.

Platform training

Platform's Professional Services training courses can help you gain the skills necessary to effectively install, configure and manage your Platform products. Courses are available for both new and experienced users and administrators at our corporate headquarters and Platform locations worldwide.

Customized on-site course delivery is also available.

Find out more about Platform Training at www.platform.com/Services/Training/, or contact Training@platform.com for details.


Get Technical Support

Contact Platform

Contact Platform Computing or your LSF vendor for technical support. Use one of the following to contact Platform technical support:

Email

support@platform.com

World Wide Web

www.platform.com

Mail

Platform Support
Platform Computing Inc.
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7

When contacting Platform, please include the full name of your company.

See the Platform Web site at www.platform.com/Company/Contact.Us.htm for other contact information.

Get patch updates and other notifications

To get periodic patch update information, critical bug notification, and general support notification from Platform Support, contact supportnotice-request@platform.com with the subject line containing the word "subscribe".

To get security related issue notification from Platform Support, contact securenotice-request@platform.com with the subject line containing the word "subscribe".

We'd like to hear from you

If you find an error in any Platform documentation, or you have a suggestion for improving it, please let us know:

Email

doc@platform.com

Mail

Information Development
Platform Computing Inc.
3760 14th Avenue
Markham, Ontario
Canada L3R 3T7

Be sure to tell us:


Copyright

1994-2008, Platform Computing Inc.

Although the information in this document has been carefully reviewed, Platform Computing Inc. ("Platform") does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

Document redistribution policy

This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole.

Internal redistribution

You may only redistribute this document internally within your organization (for example, on an intranet) provided that you continue to check the Platform Web site for updates and update your version of the documentation. You may not make it available to your organization over the Internet.

Trademarks

LSF is a registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

POWERING HIGH PERFORMANCE, PLATFORM COMPUTING, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, and the PLATFORM and PLATFORM LSF logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

Linux is the registered trademark of Linus Torvalds in the U.S. and other countries.

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

Macrovision, Globetrotter, and FLEXlm are registered trademarks or trademarks of Macrovision Corporation in the United States of America and/or other countries.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates.

Intel, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.

Third Party License Agreements

www.platform.com/legal-notices/third-party-license-agreements


© 1994-2008, Platform Computing Inc.
www.platform.com