We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:
Post your ideas
Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,
Post an idea
Upvote ideas that matter most to you
Get feedback from the IBM team to refine your idea
Help IBM prioritize your ideas and requests
The IBM team may need your help to refine the ideas so they may ask for more information or feedback. The product management team will then decide if they can begin working on your idea. If they can start during the next development cycle, they will put the idea on the priority list. Each team at IBM works on a different schedule, where some ideas can be implemented right away, others may be placed on a different schedule.
Receive notification on the decision
Some ideas can be implemented at IBM, while others may not fit within the development plans for the product. In either case, the team will let you know as soon as possible. In some cases, we may be able to find alternatives for ideas which cannot be implemented in a reasonable time.
Improve Various Components of the Current Scheduler Adlgorithm
Currently, though the Scheduler supports threads, the processing is not a thread per bucket, but rather a threads that are used on a per bucket basis and buckets are still handled serially, which has led us to Host Match times around 400 seconds.
What makes more sense to me, is a few optimizations that I think will help speed the host match space:
1) For boolean resources, the scheduler should create a shared memory table, indexed by a numeric hash that represents the hostname. This should be done once at reconfig or restart time. Doing this will prevent this from happening every scheduling cycle and save resources. Accommodations would have to be made for dynamic hosts of course. But no worries. In shared memory, this could happen using multiple threads, thus speeding the process.
2) For ELIMs and other variable resources, for each scheduling interval should create a table per resource per host for host based resources using the same numeric hash that represents the matching hostnames. In our case "healthy" is dynamic and on every queue and every bucket.
3) Have short circuit logic for sorting that when you reach an index in the sort order that has a large variation in values like r15s, mem, etc, that you IGNORE all sort fields afterwards, and just stop the sorting. This will save time on the SORT phase of the Matching. Metrics such as -slots have values between 0-maxCpus, where the probability you will have two numbers that may be the same is much higher, thus you should go beyond them, but metrics like free memory, and r15s, etc. have such a diverse range of numbers, it makes no sense to sort after the sort has been performed on them.
4) Instead of having multiple threads working on a bucket at a time, change the algorithm to do X buckets at a time using the SHMEM API and the various tables withing it to have each thread be able to access the memory tables from above for all the various resources and intersecting the resource based upon their numeric hash and not the string based hostname which will be light years faster.
5) At submission time, normalize the conbinedResreq to reduce "duplicate" resource requirements like "type == any && type == any" and to minimize any "extra bracketing" for example:
((type== any && health = ok) && type == LINUX64 && (((mem > 5000))))
If this is first normalized, the number of comparison operations can be reduced thus improving performance. In the case above, the combinedResreq should be:
type == any && health == ok && type == LINUX64 && mem > 5000
In fact type == any, should be ignored as it's assumed if it's not already. LSF should report to the user the "optimized" combinedResreq too. It's so ugly the way it is today. Part of this is not IBM's fault of course.
Do not place IBM confidential, company confidential, or personal information into any field.