Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Not under consideration
Workspace Spectrum LSF
Created by Guest
Created on May 12, 2017

Collect I/O per LSF job as a resource to be reported to RTM

As number of CPUs per machine rises, the I/O to disk rises proportionally, but the bandwidth to the disk generally does not scale. We have found that in high-cpu machines, the system can grind to a near halt due to overload of the disk I/O. We can see the culprits running iotop on an affected host, or looking at sar. But, this just tells us the PID. We can use lsload -l to see a summary of I/O per host, but again, this does not differentiate the culprit jobs. RTM can plot the overall I/O per host, but not per job. We have resorted to running cron jobs on hosts, which grab the PIDs of the running jobs on a host, and then runs iotop on those PIDs, and saves results to a data file. From this we can at least plot out the I/O per job ... but it would be much better to have this available directly in RTM, and to have LSF able to plot it. Of course, the lim on each host would need to collect this info and pass it to the master. There is already a field in the lsb.acct, in ru_ioch, but only for HP-UX. This could be used for Linux as well.

  • Guest
    Reply
    |
    Aug 21, 2020

    We have considered this request and it is not something we are able to deliver in the future. If there is broad interest for this, it can be resubmitted in 18 months.

  • Guest
    Reply
    |
    Aug 30, 2017

    Accurately reflecting IO per job is not a trivial task and has been investigated a number of times in the past. Different tools give different results depending on the type of IO - file access to local disk, nfs disk, rdma disk etc all give different results; the effects of caches, toe offload, network compression etc all impact true IO. In most cases, the "true IO" is that entering/leaving the filer - and while there are tools that give you the filer view - they do not give you the "job" that is actually causing the issue.

    Tools such as Ellexus Mistral can throttle the job from the client side for high IO (and that is integrated with LSF RTM for reporting/alarms).

    We currently have a prototype of being able to report on "per job" IO from the filer perspetctive with Spectrum Scale; we're still investigating whether that approach can be extended to generic NFS or Netapp.