Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Delivered
Workspace Spectrum LSF
Created by Guest
Created on May 2, 2017

Show number of requested slots for pending jobs in bjobs output

The main unit for jobs in our setup is cores, i.e. one slot correspond to one core. For jobs that use affinity, e.g. hybrid MPI-OpenMP jobs, the number of slots for pending jobs only shows the slots given with the '-n' option of bsub, i.e. the number of MPI processes. The total number of slots (cores) needed by the job, i.e. number of threads within in each process times the number of MPI processes, is not visible in the bjobs output for pending jobs. This makes is hard for users and administrators to check why a job is pending.

We suggest a new field called "nreq_slots", that shows a value of the total number of slots requested by the job. LSF has already alloc_slot ( do man of bjobs), but this applies to running jobs, only.

  • Guest
    Reply
    |
    Feb 26, 2019

    It is addressed in LSF 10.1.0.7. You can download LSF 10.1.0.7 from Fix Central.

    Fix Central URL(s):
    IBM Spectrum LSF:
    http://www.ibm.com/support/fixcentral/swg/selectFixes?product=ibm/Other+software/IBM+Spectrum+LSF&release=All&platform=All&function=fixId&fixids=lsf-10.1.0.7-spk-2018-Dec-build509238&includeSupersedes=0

    Fix ID: lsf-10.1.0.7-spk-2018-Dec-build509238

    The following files are uploaded:
    Readme_build509238.htm
    checksum.md5
    lsf10.1.0.7_fixed_bugs.pdf
    lsf10.1_linux2.6-glibc2.3-x86_64-509238.tar.Z
    lsf10.1_lnx310-lib217-ppc64le-509238.tar.Z
    lsf10.1_lnx310-lib217-x86_64-509238.tar.Z
    lsf10.1_lnx312-lib217-armv8-509238.tar.Z

  • Guest
    Reply
    |
    Oct 2, 2017

    In your example, the pending reason is accurate - the job is not running because of the specific limit being reached.

    However, I do agree that the output of bjobs -o for the pending job may be confusing to a user who did not understand what they were asking for.

    We'll consider implementing an enhancement in a future LSF 10 serivce pack to update the requested slot count when it can be calculated at submission.

  • Guest
    Reply
    |
    Sep 20, 2017

    Here is another, more explicit example. Let's say we have a slot limit of 16 slots/user for a given queue, and a user submits this job:

    bsub -n 4 -R "2*{affinity[core(4)]} + 2*{affinity[core(7)]}" myapp

    This job will never be dispatched, since the total number of slots needed is 22 (2*4 + 2*7), which exceeds the 16 slots/user limit for the queue.

    bjobs shows:

    $ bjobs -o "jobid: min_req_proc:4 max_req_proc:4 stat: name:"
    JOBID MIN_ MAX_ STAT JOB_NAME
    14333 4 4 PEND myapp

    and 'bjobs -p' says:

    Resource limit defined on queue has been reached (Resource: slots, Queue: hpc, Limit Name: hpc_default_slot_limit, Limit Value: 16);

    To see the real reason for the pending, I have to analyze the RES_REQ expression "manually", although LSF must have this information available, otherwise the job would not be pending.

    This is confusing!

  • Guest
    Reply
    |
    Aug 30, 2017

    Thanks for the feedback. I agree that for simple usage, the behaviour could be viewed as confusing; but for complex usage, showing the wrong value would be equally confusing.
    We'll review this again and see if we can come up with a solution that address the opposing viewpoints.

  • Guest
    Reply
    |
    Jul 31, 2017

    Thanks for your feedback, and your explanations of the current behavior. However, while your arguments make good sense for somebody with at least some knowledge in job scheduling, and especially how LSF works internally, the normal user (and especially new users) might easily get confused.

    In our environment, we have hundreds of users, were the majority is not very experienced. Most users will ask for X cores for a job, assuming that their jobs will run on X cores - no more, no less. Furthermore, we apply certain limits, e.g. the number of cores (slots) that a user might use simultaneously.

    For a simple request, i.e. '-n X', the number of requested slots/core is easy to see, for more complicated ones, like the one mentioned in the RFE, or even more complicated compound requests, the user will have a hard time to understand the pending of a job, if the 'total requested' number of slots is not shown. It will also make the live of admins and/or help desk support easier, if this number is easily accessible.

    I know there will be situations, where there can an interval, i.e. a min. and a max. number. I'd suggest that the minimum number of requested slots should be shown here, as a default, since this is the minimum requirement that e.g. violates the core limit per user.

    I hope the above makes sense. Feel free to get back to me, and we can discuss this further, if needed.

  • Guest
    Reply
    |
    Jul 13, 2017

    Apologies for the delay in responding as this RFE has resulted in a lot of internal discussion.

    In your example: bsub -n 4 -R "span[ptile=1] affinity[core(4)]" : the number of slots to allocate is fixed and can be calculated while the job is pending.

    However, in many other cases, the number of slots to be allocated is not known until the job is actually scheduled. For example:

    * bsub -x
    * if affinity includes exclusive=alljobs or affinity[numa(1)]
    * if there are alternative resource requirements
    * if we have -n min,max, or using JOB_SIZE_LIST,
    * bsub -n 2 -R "affinity[numa(1)]"

    the actual allocation will depend on the hosts allocated, so the "correct" value cannot be calculated until the job is actually scheduled & dispatched - which is the current behaviour.

    Our rational behind the current behaviour is consistency - easier to explain to a user that they value is only shown once they job is dispatched, rather than to explain why the value would be shown for some jobs, and not for others.

    If you think the latter is easier to explain to users, then we can consider making this configurable in a future release.