Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Delivered
Workspace Spectrum LSF
Created by Guest
Created on Nov 14, 2018

Relax GPU affinity while maintaining Core affinity

Prior to the introduction of the LSB_GPU_NEW_SYNTAX=extend it was possible to have jobs using core affinity and be assigned to cores on a cpu that is not directly connected to the gpu that was assigned to the job. After the introduction it will allow the job only if there is atleast one core per gpu allocated from the cpu local to the gpu(s) assigned. When you have a mix of systems where some have gpus equally spread between the cpus and others where the gpus are all interconnected with an SXM2 fabric and then only connected to one of the cpus this can result in stranded resources in a non-exclusive job system. In many cases having access to a farther gpu is better than not scheduling a job at all. While removing core affinity removed the problem of the jobs not being scheduled it allows the jobs to use whatever cores they may want and breaks the feature of using core affinity to limit a job to only using as many cores as it requested in number of slots.
While it could be possible to have people submit some jobs that requested close gpus and others that requested far gpus that would lead to similar stranded capacity if they are not aware of what other jobs are requesting.

Prefering cpu cores near the gpu that is assigned would be good but having an option to relax to a preference but not a requirement would be the goal here.

We have multiple different gpu layouts 4 gpus with sxm2 connected to a cpu0 and none connected to cpu1, 8 gpus on pcie with 4 connected to cpu0 and 4 to cpu1, 10 gpus on pcie connected to cpu0 and none connected to cpu1. We also allow multiple users jobs on all nodes. Jobs can request any number of gpus and by amount of memory or type of gpu as we have many different gpu types available. Some jobs may need a particular gpu while others just need a gpu of some type and their locality is not strictly important. One of the jobs that are common that fit this would be a job that submits the work to the gpu(s) and then the cpu portion does very little other than wait for the results. These jobs tend to take advantage of our nodes with the sxm2 interconnect and spend the majority of their time working between the gpus and not interacting with the cpu. The second would be applications that farm out some portion of their compute to the gpu but are not strictly gpu applications. These can be very bursty but are not written to necessarily need a same numa close gpu to operate effectively. That is particularly true in codes that were written originally for running on a workstation. These applications tend to only need a single gpu but may request all or more resources than just cpu0 and in a single rooted gpu system that leaves cpu1 starved of gpus if core affinity is needed to fence the job.

  • Guest
    Reply
    |
    Mar 25, 2019

    Fix Central URL(s):
    http://www.ibm.com/support/fixcentral/swg/selectFixes?product=ibm/Other+software/IBM+Spectrum+LSF&release=All&platform=All&function=fixId&fixids=lsf-10.1-build514642&includeSupersedes=0
    Fix ID: lsf-10.1-build514642

  • Guest
    Reply
    |
    Jan 4, 2019

    This is planned for inclusion in LSF 10.1.0.8. We will provide a patch to 10.1.0.7 when available.

  • Guest
    Reply
    |
    Dec 2, 2018

    using gtile='!' may provide some respite from this, we'll look into it further.