文件名称:
lsf-admin(openlava)
开发工具:
文件大小: 9mb
下载次数: 0
上传时间: 2019-03-04
详细说明:lsf操作手册 openlava亦可以使用之 详细的操作手册
lsf是IBM的一款集群调度软件 openlava是一款兼容lsf操作的集群调度软件IBM Spectrum LSF
Version 10 release 1
Administering B/ Spectrum LSF
Note
Before using this information and the product it supports, read the information in [Notices"on page 841]
This edition applies to version 10, release 1 of IBM Spectrum LSF (product numbers 5725G82 and 5725 25)and to
all subsequent releases and modifications until otherwise indicated in new editions
Significant changes or additions to the text and illustrations are indicated by a vertical line(I)to the left of the
change
If you find an error in any IBM Spectrum Computing documentation, or you have a suggestion for improving it, let
us know
Log in to IBM Knowledge Center with your IBMid, and add your comments and feedback to any topic
Copyright IBM Corporation 1992, 2016.
S Government Users Restricted Rights- Use, duplication or disclosure restricted by gsa ADP Schedule Contract
with IBM Corp
Contents
Chapter 1. Managing Your Cluster
Resource allocation i imits
455
Working with Your Cluster
Reserving resources
467
LSr Daemon Startup Control
23
Job Dependency and Job Priority
480
Working with Hosts
30 Job Requeue and Job rerun
Managing job execution
67
Job Migration
Working with Queues
106 Job Checkpoint and Restart
IBM Spectrum i sf Host-based resources
l16
Resizable jobs
External Load Indices
144
Chunk jobs and job arrays
536
anaging LSF user groups
158 Job Packs.
External Host and User Groups
163
Between-Host User Account Mapping
167 Chapter 6. Energy Aware Scheduling 551
Cross-Cluster User Account Mapping
172
About Energy Aware Scheduling(EAS).....551
UNIX/Windows User Account Mapping
177
Managing host power states
CPU frequency management
560
Chapter 2. Monitoring Your Cluster 185 Automatic CPU frequency selection
563
Achieve performance and scalability
185
Event generation
196
Chapter 7 Control Job Execution
573
Tuning the Cluster.
197
Runtime resource usage limits
Authentication and authorization
209
Load thresholds
590
External authentication
215
Pre-Execution and Post-Execution Processin
8
594
Job file
e Spooling
228
Job Starters
613
Non-Shared File systems
234 Job Controls
Error and event logging
.239
External Job Submission and Execution Controls 625
Troubleshooting LSF Problems
248
Interactive Jobs with bub
648
Interactive and Remote Tasks
658
Chapter 3. Time-Based Configuration 265 Running Parallel Jobs
.664
Time Configuration
265
Advance reservation
Chapter 8 Appendices
747
Submitting Jobs Using jSDL
747
Chapter 4. Job scheduling policies 293 Using Istch
757
Preemptive scheduling
Using Session Scheduler
Specifying Resource Requirements
307
Using Ismake
779
Fairshare Scheduling
Manage lsf on ego
Global Fairshare Scheduling
LSF Integrations
800
R
Launching aNsys Jobs
38
Guaranteed resource pools
406
PVM jobs
839
oal-Oriented sla-Driven scheduling
Exclusi
g
43
Notices
841
Trademarks
843
Chapter 5 Job Scheduling and
Terms and conditions for product documentation 843
Dispatch
437
844
Working with Application Profiles
437
Job directories and Data
Index
845
C Copyrigl
1992,2016
Iv Administering IBM Spectrum LSF
Chapter 1. Managing your cluster
Working with Your Cluster
Learn about IBM Spectrum LSF directories and files, commands to see cluster
information control workload daemons and how to configure your cluster
LSF Terms and Concepts
Before you use Lsf for the first time you should read the lsf Foundations guide
for a basic understanding of workload management and job submission and the
Administrator Foundations Guide for an overview of cluster management and
operations
Job states
IBM Spectrum lSF jobs have several states
PEND Waiting in a queue for scheduling and dispatch
run Dispatched to a host and running
DONE
Finished normally with zero exit value
EXIT Finished with nonzero exit value
PSUSP
Suspended while the job is pending
USUSP
Suspended by user
SSUSP
Suspended by the lsf syste
POST DONE
Post-processing completed without errors
POST ERR
Post-processing completed with errors
UNKWN
The mbatchd daemon lost contact with the sbatchd daemon on the host
where the job runs
WAIt For jobs submitted to a chunk job queue, members of a chunk job that are
ting
ZOMBI
a job becomes ZoMbi if the execution host is unreachable when a
non-rerunnable job is killed or a rerunnable job is requeued
Host
An Lsf host is an individual computer in the cluster.
Each host might have more than one processor. Multiprocessor hosts are used to
run parallel jobs. A multiprocessor host with a single process queue is considered a
single machine. a box full of processors that each have their own process queue is
treated as a group of separate machines
C Copyright IBM Corp 1992, 2016
Managing Your Cluster
Tip:
The names of your hosts should be unique. They cannot be the same as the cluster
name or any queue that is defined for the cluster
Job
An lsf job is a unit of work that runs in the lsf system
a job is a command that is submitted to i sf for execution by using the bsub
command. LSF schedules, controls, and tracks the job according to configured
policies
Jobs can be complex problems, simulation scenarios, extensive calculations,
anything that needs compute power
Job files
When a job is submitted to a queue, LSf holds it in a job file until conditions are
right for it run. Then, the job file is used to run the job
On UNIX, the job file is a Bourne shell script that is run at execution time
On Windows, the job file is a batch file that is processed at execution time
Interactive batch job
An interactive batch job is a batch job that allows you to interact with the
application and still take advantage of LSF scheduling policies and fault tolerance
All input and output are through the terminal that you used to type the job
submission command
When you submit an interactive job a message is displayed while the job is
awaiting scheduling. A new job cannot be submitted until the interactive job is
completed or terminated
Interactive task
An interactive task is a command that is not submitted to a batch queue and
scheduled by Lse, but is dispatched immediately
LSF locates the resources that are needed by the task and chooses the best host
among the candidate hosts that has the required resources and is lightly loaded
Each command can be a single process, or it can be a group of cooperating
proce
Tasks are run without using the batch processing features of lsf but still with the
advantage of resource requirements and selection of the best host to run the task
based on loa
ocal task
A local task is an application or command that does not make sense to run
remotely
Tor example, the 1s command on UNIX
2 Administering IBM Spectrum LSF
Managing your cluster
Remote task
A remote task is an application or command thatthat can be run on another
machine in the cluster
Host types and host models
Hosts in LSF are characterized by host type and host model
The following example is a host with type X86 64, with host models Opteron 240
Optcron840, Intel EM64T, and so on
Host type
Host models Opteron240
Opteron840
Intel EM64T
Host type:
An lsF host type is the combination of operating system and host CPU
architecture
All computers that run the same operating system on the same computer
architecture are of the same type. These hosts are binary-compatible with each
Each host type usually requires a different set of lsf binary files
Host mode
An LSF host model is the host type of the computer, which determines the CPU
speed scaling factor that is applied in load and placement calculations
The Cpu factor is considered when jobs are being dispatched
Resources
LSF resources are objects in the lsf system resources that lsf uses track job
requirements and schedule jobs according to their availability on individual hosts
Resource usage
The Lsf system uses built-in and configured resources to track resource availability
and usage. Jobs are scheduled according to the resources available on individua
hosts
Jobs that are submitted through the lsf system will have the resources that they
use monitored while they are running. This information is used to enforce resource
limits and load thresholds as well as fairshare schedulin
LSf collects the following kinds of information
Total CPU time that is consumed by all processes in the job
Total resident memory usage in KB of all currently running processes in a job
Total virtual memory usage in KB of all currently running processes in a job
Currently active process group ID in a job
hapter l Managing Your Cluster 3
Managing Your Cluster
Currently active processes in a jo
On UNIX and Linux, job-level resource usage is collected through PIM
Load indi
Load indices measure the availability of dynamic, non-shared resources on hosts in
the cluster load indices that are built into the lim are updated at fixed time
nterval
External load indices:
Defined and configured by the lsf administrator and collected by an External
Load Information Manager(ELIM) program. The ELIM also updates LiM when
new values are received
Static resourcess
Built-in resources that represent host information that does not change over time
such as the maximum ram available to user processes or the number of
processors in a machine. Most static resources are determined by the lim at
start-up time
Static resources can be used to select appropriate hosts for particular jobs that are
based on binary architecture relative cpu speed, and system configuration
Load thresholds
Two types of load thresholds can be configured by your LSF administrator to
chedule jobs in queues. Each load threshold specifies a load index value
The loadSched load threshold determines the load condition for dispatchir
ng
ng jobs. If a host's load is beyond any defined loadSched, a job cannot be
started on the host. This threshold is also used as the condition for resuming
suspended jobs
The loadsStop load threshold determines when running jobs can be suspended
To schedule a job on a host, the load levels on that host must satisfy both th
thresholds that are configured for that host and the thresholds for the queue from
which the job is being dispatched
The value of a load index might either increase or decrease with load, depending
on the meaning of the specific load index. Therefore, when you compare the host
load conditions with the threshold values, you need to use either greater than(>)
or less than(<), depending on the load index
Runtime resource usage limits
Limit the use of resources while a job is running. Jobs that consume more than the
specified amount of a resource are signaled
Iard and soft limits:
Resource limits that are specified at the queue level are hard limits while limits
that are specified with job submission are soft limits. See the setrlimit man page
for information about hard and soft limits
Resource allocation limits:
Restrict the amount of a resource that must be available during job scheduling for
different classes of jobs to start, and which resource consumers the limits apply to
If all of the resource is consumed, no more jobs can be started until some of the
resource is released
4 Administering IBM Spectrum LSF
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.