Add functionality to LSF Advance Reservations

See this idea on ideas.ibm.com

Allow LSF jobs start running before reservation and continue during resource reservation, if reservation allows to run these jobs.

Post comment

Guest

Reply
| Oct 4, 2016

This RFE's Headline was changed after submission to reflect the headline of an internal request we were already considering, but will now track here.

0 reply Hide replies

Guest

Reply
| May 6, 2015

After discussions with Larry we better understand the requirement. We will schedule a patch for this.

0 reply Hide replies

Guest

Reply
| Apr 23, 2015

Hi Bill,

I've tested your suggestion and it doesn't work. Have you tested it ?
Sorry it takes some time for me, because we have custom esub and -W option is required. I've tested on my test LSF cluster.

1. I've created regular user reservation:
[lsfadmin@mgmt2 etc]$ brsvadd -o -n 64 -m node3-20 -u lsfadmin -b 2015:04:23:11:00 -e 2015:04:23:13:00 -N t1
Reservation t1 is created
[lsfadmin@mgmt2 etc]$ brsvs
RSVID TYPE USER NCPUS RSV_HOSTS TIME_WINDOW
t1 user lsfadmin 0/64 node3-20:0/64 4/23/11/0-4/23/13/0

2. I submitted job as user lsfadmin(the same as in reservation), with -We which will overlap the start reservation time.
[lsfadmin@mgmt2 configdir]$ bsub -q lsftest -m node3-20 -n 1 -We 10 sleep 60

Initializing program...
Thu Apr 23 10:58:16 2015
ProgType =
Read job specification...
Thu Apr 23 10:58:16 2015
***Reading Job Command File***
***Parsing Job Command File***
LSB_SUB_QUEUE = "lsftest"
LSB_SUB3_RUNTIME_ESTIMATION = 600
LSB_SUB_COMMANDNAME = "sleep"
LSB_SUB_COMMAND_LINE = "sleep 60"
LSB_SUB_HOSTS = "node3-20"
LSB_SUB_NUM_PROCESSORS = 1
LSB_SUB_MAX_NUM_PROCESSORS = 1
***Environment***
Applying Mt. Sinai Options...
Thu Apr 23 10:58:16 2015
User lsfadmin specifies queue "lsftest"
Job <837> is submitted to queue .

3. The job won't run because reservation, the node3-20 is empty.
[lsfadmin@mgmt2 configdir]$ bjobs -lp 837

Job <837>, User , Project , Application , Status
, Queue , Job Priority <50>, Command 60>
Thu Apr 23 10:58:16: Submitted from host , CWD h/minerva_test/configdir>, Specified Hosts ;
RUNTIME
10.0 min of mgmt2bq
PENDING REASONS:
Unable to reach slave batch server: node24-48;
Not enough slots or resources for whole duration of the job: node3-20;
Not specified in job submission: node28-1, mgmt2, mgmt3;

SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -

cpu_mhz healthy gbytesin gmbytesin gbytesout gmbytesout gopens
loadSched - - - - - - -
loadStop - - - - - - -

gcloses greads gwrites grdir giupdate gbytesin_orga gmbytesin_orga
loadSched - - - - - - -
loadStop - - - - - - -

gbytesout_orga gmbytesout_orga ngpus ngpus_shared ngpus_excl_t
loadSched - - - - -
loadStop - - - - -

ngpus_excl_p
loadSched -
loadStop -

RESOURCE REQUIREMENT DETAILS:
Combined: select[type == any] order[!-slots:-maxslots] rusage[mem=2000.00] sam
e[model] affinity[core(1)*1]

lsfadmin@mgmt2 configdir]$ bhosts -l node3-20
HOST node3-20
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok 60.00 - 64 0 0 0 0 0 -

CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots cpu_mhz
Total 0.0 0.0 0.0 29% 0.0 132 0 1198 870G 0M 240.1G 64 1400.0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M - 0.0

Please, let me know if I am missing something. I need a working solution for this problem.

The other question I need to provide -U reservation_id for the job to run during reservation. Is it possible to omit it, so users don't need to keep track of reservations and if reservation is valid job will run in it. We are using IBM Platform LSF Standard 9.1.2.0.

Thanks,

Sveta Mazurkova.

0 reply Hide replies

Guest

Reply
| Apr 22, 2015

Hi Bill,

Sorry for very late replay. Can you please reopen this PMR. Unfortunately I haven't received e-mail with your questions and I wonder why.
Please, check historical PMR on this issue. [PMR 77813,7TD,000]

Summering your suggestion: User can submit job with -We and job may start to run before normal user reservation even if -We time will overlap with reservation and continue to run regardless of the start time reservation. Is it correct? I am going to test it now.

Thanks,

Sveta.

0 reply Hide replies

Guest

Reply
| Feb 11, 2015

Sveta, I'm not sure I understand your request.

There are two run limit parameters, -We (run estimate) and -W (hard runlimit). The run estimate parameter allows the user to specify their best guess, but they won't get penalised for it if they are wrong.

When trying to fit a job in before a system reservation, the scheduler will use -We if specified. So if someone wants to take the risk and try and get their job in before the reservation starts, they could do bsub -We 1 a.out and if it finishes in time then great, if not, it will get killed.

By definition, a system reservation is exclusive, so jobs that haven't completed by the time it becomes active will be killed.

If you don't want them killed, the simplest solution would be to create a normal user reservation, rather than system reservation - jobs will continue past the start of the reservation. And when you really want to do maintenance, you can either explicitly kill them, or submit a job that will fill the reservation which will force those jobs to be killed/requeued.

Regards,
Bill McMillan, Global Product Portfolio Manager for the IBM Platform LSF Family

0 reply Hide replies

Guest

Reply
| Jan 29, 2015

Creating a new RFE based on Community RFE #65036 in product Platform LSF.

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Add functionality to LSF Advance Reservations