Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Delivered
Workspace Spectrum LSF
Created by Guest
Created on Jun 8, 2017

Fix systemd unit file created by hostsetup to allow MPI jobs to use RDMA

In response to RFE #89685, requesting the creation and use of a systemd unit file, the fix 432732.

The fix modified hostsetup to create the unit file, below, and register the lsfd service.

[Unit]
Description=IBM Spectrum LSF
After=network.target nfs.service autofs.service gpfs.service

[Service]
Type=forking
ExecStart=${LSF_SERVERDIR}/lsf_daemons start
ExecStop=${LSF_SERVERDIR}/lsf_daemons stop
KillMode=none

[Install]
WantedBy=multi-user.target

Unfortunately, as systemd runs without reference to system settings - specifically /etc/security/limits.conf - when the lsfd service is started by systemd, the daemons inherit those limits from systemd and any jobs run on such hosts inherit *their* limits from sbatchd.

The upshot is that MPI jobs are unable to use RDMA, since they are unable to lock enough memory to allocate RDMA buffers, causing the jobs to fail.

eg

fluent_mpi.17.0.0: Rank 0:1: MPI_Init: ibv_create_cq() failed 4
fluent_mpi.17.0.0: Rank 0:1: MPI_Init: Can't initialize RDMA device
fluent_mpi.17.0.0: Rank 0:1: MPI_Init: Internal Error: Cannot initialize RDMA protocol

You may test if this is the cause of the problem by brunning 'ulimit -l' on the node in question.

Systemd provides a mechanism to rectify this weakness in its model. By adding the line 'LimitMEMLOCK=infinity' (as shown below) to the unit file, re-brunning the 'ulimit -l' command shows 'unlimited' and MPI jobs may proceed correctly.

[Unit]
Description=IBM Spectrum LSF
After=network.target nfs.service autofs.service gpfs.service

[Service]
LimitMEMLOCK=infinity
Type=forking
ExecStart=${LSF_SERVERDIR}/lsf_daemons start
ExecStop=${LSF_SERVERDIR}/lsf_daemons stop
KillMode=none

[Install]
WantedBy=multi-user.target

  • Guest
    Reply
    |
    Aug 30, 2017

    This shoudl be treated as a defect and fixed as a bug, an RFE is not required

  • Guest
    Reply
    |
    Jun 8, 2017

    Yes. I have let John file this RFE. This is a very good request for enhancement.
    From line number 1068 in the new hostsetup for LSF10.1:

    cat > $_tmp_service_file << EOF
    [Unit]
    Description=IBM Spectrum LSF
    After=network.target nfs.service autofs.service gpfs.service

    [Service]
    Type=forking
    ExecStart=${LSF_SERVERDIR}/lsf_daemons start
    ExecStop=${LSF_SERVERDIR}/lsf_daemons stop
    KillMode=none

    [Install]
    WantedBy=multi-user.target

    EOF


    John has suggested to add one option/parameter for hostsetup script. When using this option(e.g., --infiniband), the code above may be looking as:

    From line number 1068 in the new hostsetup for LSF10.1:

    cat > $_tmp_service_file << EOF
    [Unit]
    Description=IBM Spectrum LSF
    After=network.target nfs.service autofs.service gpfs.service

    [Service]
    Type=forking
    LimitMEMLOCK=infinity
    ExecStart=${LSF_SERVERDIR}/lsf_daemons start
    ExecStop=${LSF_SERVERDIR}/lsf_daemons stop
    KillMode=none

    [Install]
    WantedBy=multi-user.target

    EOF


    This is not difficult to implement. Suggest to adopt.