About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
This RFE's Headline was changed after submission to reflect the headline of an internal request we were already considering, but will now track here.
After discussions with Larry we better understand the requirement. We will schedule a patch for this.
Hi Bill,
I've tested your suggestion and it doesn't work. Have you tested it ?
Sorry it takes some time for me, because we have custom esub and -W option is required. I've tested on my test LSF cluster.
1. I've created regular user reservation:
[lsfadmin@mgmt2 etc]$ brsvadd -o -n 64 -m node3-20 -u lsfadmin -b 2015:04:23:11:00 -e 2015:04:23:13:00 -N t1
Reservation t1 is created
[lsfadmin@mgmt2 etc]$ brsvs
RSVID TYPE USER NCPUS RSV_HOSTS TIME_WINDOW
t1 user lsfadmin 0/64 node3-20:0/64 4/23/11/0-4/23/13/0
2. I submitted job as user lsfadmin(the same as in reservation), with -We which will overlap the start reservation time.
[lsfadmin@mgmt2 configdir]$ bsub -q lsftest -m node3-20 -n 1 -We 10 sleep 60
Initializing program...
Thu Apr 23 10:58:16 2015
ProgType =
Read job specification...
Thu Apr 23 10:58:16 2015
***Reading Job Command File***
***Parsing Job Command File***
LSB_SUB_QUEUE = "lsftest"
LSB_SUB3_RUNTIME_ESTIMATION = 600
LSB_SUB_COMMANDNAME = "sleep"
LSB_SUB_COMMAND_LINE = "sleep 60"
LSB_SUB_HOSTS = "node3-20"
LSB_SUB_NUM_PROCESSORS = 1
LSB_SUB_MAX_NUM_PROCESSORS = 1
***Environment***
Applying Mt. Sinai Options...
Thu Apr 23 10:58:16 2015
User lsfadmin specifies queue "lsftest"
Job <837> is submitted to queue .
3. The job won't run because reservation, the node3-20 is empty.
[lsfadmin@mgmt2 configdir]$ bjobs -lp 837
Job <837>, User , Project , Application , Status
, Queue , Job Priority <50>, Command 60>
Thu Apr 23 10:58:16: Submitted from host , CWD h/minerva_test/configdir>, Specified Hosts ;
RUNTIME
10.0 min of mgmt2bq
PENDING REASONS:
Unable to reach slave batch server: node24-48;
Not enough slots or resources for whole duration of the job: node3-20;
Not specified in job submission: node28-1, mgmt2, mgmt3;
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
cpu_mhz healthy gbytesin gmbytesin gbytesout gmbytesout gopens
loadSched - - - - - - -
loadStop - - - - - - -
gcloses greads gwrites grdir giupdate gbytesin_orga gmbytesin_orga
loadSched - - - - - - -
loadStop - - - - - - -
gbytesout_orga gmbytesout_orga ngpus ngpus_shared ngpus_excl_t
loadSched - - - - -
loadStop - - - - -
ngpus_excl_p
loadSched -
loadStop -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == any] order[!-slots:-maxslots] rusage[mem=2000.00] sam
e[model] affinity[core(1)*1]
lsfadmin@mgmt2 configdir]$ bhosts -l node3-20
HOST node3-20
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok 60.00 - 64 0 0 0 0 0 -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots cpu_mhz
Total 0.0 0.0 0.0 29% 0.0 132 0 1198 870G 0M 240.1G 64 1400.0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M - 0.0
Please, let me know if I am missing something. I need a working solution for this problem.
The other question I need to provide -U reservation_id for the job to run during reservation. Is it possible to omit it, so users don't need to keep track of reservations and if reservation is valid job will run in it. We are using IBM Platform LSF Standard 9.1.2.0.
Thanks,
Sveta Mazurkova.
Hi Bill,
Sorry for very late replay. Can you please reopen this PMR. Unfortunately I haven't received e-mail with your questions and I wonder why.
Please, check historical PMR on this issue. [PMR 77813,7TD,000]
Summering your suggestion: User can submit job with -We and job may start to run before normal user reservation even if -We time will overlap with reservation and continue to run regardless of the start time reservation. Is it correct? I am going to test it now.
Thanks,
Sveta.
Sveta, I'm not sure I understand your request.
There are two run limit parameters, -We (run estimate) and -W (hard runlimit). The run estimate parameter allows the user to specify their best guess, but they won't get penalised for it if they are wrong.
When trying to fit a job in before a system reservation, the scheduler will use -We if specified. So if someone wants to take the risk and try and get their job in before the reservation starts, they could do bsub -We 1 a.out and if it finishes in time then great, if not, it will get killed.
By definition, a system reservation is exclusive, so jobs that haven't completed by the time it becomes active will be killed.
If you don't want them killed, the simplest solution would be to create a normal user reservation, rather than system reservation - jobs will continue past the start of the reservation. And when you really want to do maintenance, you can either explicitly kill them, or submit a job that will fill the reservation which will force those jobs to be killed/requeued.
Regards,
Bill McMillan, Global Product Portfolio Manager for the IBM Platform LSF Family
Creating a new RFE based on Community RFE #65036 in product Platform LSF.