Spectrum LSF

Extend SessionScheduler being able to manage small parallel application (size up to one node)

Today we have applications/software running in serial mode via session scheduler (runtime per task <<10min) with up to 10000 tasks. These applications/software will migrate to parallel usage soon and we want to keep the SessionScheduler func...
4 months ago in Spectrum LSF 1 Delivered

GPU Guaranteed Resource Reservation

Currently Guaranteed Resource Pools in LSF can fulfill SLAs based on slots, memory, licenses, or packages (slots + memory). With more demand for GPU resources in a cluster, it would be beneficial to add GPU resources to a Guaranteed Resource Pool....
4 months ago in Spectrum LSF 1 Future consideration

LSF blimits command new option

blimits -o option to list only relevant resource and consumer columns, because many are irrelevant with the value of "-"
3 months ago in Spectrum LSF 1 Future consideration

RTM Implementation Total GPU Wall time statistic.

RTM GPU Wall time statistic is limited to "only Finished Exclusive job has GPU wall time reported and it based on DCGM".We would like to see statistic for GPU wall time usage regardless job exist status and GPU mode. Also because GPUs are occupied...
7 months ago in Spectrum LSF 1 Future consideration

The number of TCP exceeds the max number 2*200 defined in LSF source code

We're getting the following message from master lim.log.doAcceptConn: Can't maintain this big clientMap[400], Connection from droppedAnd I received the following explanation and advice through case number TS003933691.It looks ...
over 1 year ago in Spectrum LSF 3 Delivered

Modify cgroup memory limit of running job

With "bsub -M ..." LSF can use cgroups to limit the amount of memory a job can use. Unfortunately, bmod cannot change this for running, despite the cgroups do support modifying this.
4 months ago in Spectrum LSF 1 Planned for future release

LSF/GPFS monitoring and job submission

Nvidia would like a documented/supported methodology to ensure lsf and scale/gpfs nsd nodes are in healthy state prior to job submission.
4 months ago in Spectrum LSF 2 Delivered

Cluster level graph for used/total memory

Customer would like to have a RTM graph for cluster level avg Used/Total memory.Log on behalf of customer. Please contact
6 months ago in Spectrum LSF 2 Delivered

LSF integration with latest DCGM

LSF provides integration with DCGM so that GPU usage data can be collected in lsb.acct log file. However, the latest LSF only supports DCGM v1.7.2, which goes with CUDA10.For customs like us, we used CUDA11, which needs DCGM v2.1.4, and we would l...
8 months ago in Spectrum LSF 2 Delivered

Set a limit in the MC receive queue

TSMC is using MC to balance work load between clusters. When the submission cluster is busy, it forwards some jobs to execution clusters. The execution cluster has it's own workload except the forwarded jobs. To avoid the forwarded jobs occupy mos...
7 months ago in Spectrum LSF 0 Planned for future release