Our client is requesting a feature enhancement on top of what we have in the product. The following is an explanation of the scheduling policy
Here is the description of the feature we would like for Symphony.
Symphony provides a few scheduling policies, including one called R_PriorityScheduling. This is the one we use the most and we heavily rely on priorities between tasks/sessions.
R_PriorityScheduling has a major flaw that we would like to address by creating a new scheduling policy that we call R_StrictPriorityScheduling.
Let's start by describing the flaw of R_PriorityScheduling using an example.
There are originally only one type of tasks executing and pending for a given application using R_PriorityScheduling, as follows:
• B: lower priority 1000, 4 slots per task.
B tasks occupy the entire cluster. There are no available slots. Then, many tasks of another type are submitted:
• A: highest priority 9999, 6 slots per task.
At some point, a B task completes so 4 slots become available. The next highest priority task (A) require 6 slots, so cannot be started at this point. Symphony then looks at other tasks, and starts the next B task, which only requires 4 slots. This fills the 4 available slots and we're back to a fully occupied cluster.
It can easily be seen that in this situation, A tasks will never get executed until there are no longer B tasks pending, even though A tasks have higher priority. In practice, a few A tasks get to start when 2 or more B tasks end at the same time on the same host, leaving 8 available slots; if the timing is right, Symphony will get a chance to start an A task before it starts a B task. But this is far from respecting the priority scheme.
This may seem like a degenerated case, but it is actually quite easy to come across it. Often enough that we simply cannot use different serviceToSlotRatios in an application that uses R_PriorityScheduling, because the priorities will not be respected (which is essential to us). Doing otherwise causes deadlocks because many B tasks are waiting for A tasks to complete, but that never happens because A tasks never start.
In order to eliminate this flaw, we would like a new scheduling policy called R_StrictPriorityScheduling. It's essentially the same as the current R_PriorityScheduling, but priorities are strictly respected, regardless of slot availability.
So in the example above, when 4 slots become available following the completion of a B task, Symphony will not start a B task when it realizes that it cannot start the higher priority A task because it requires 6 slots. It will leave the 4 slots unused until there are 6 slots (or more) available to start the highest priority A task.
When another B task completes on a host where there are already 4 available slots, there will then be 8 slots available on that host, and Symphony will start an A task using 6 slots, leaving 2 unused slots which will remain unused until, once again, there are enough slots available for the next highest priority task.
The same logic must also apply to resource plan maximums, e.g. limit an application to use at most N slots in a given resouce group, and any other feature that restricts how slots are used. The general rule is that if there are not enough slots to execute the pending task with the highest priority, the slots must remain unused until there are enough available slots to execute that task.
Of course, this may leave unused slots on several different hosts. A few scattered "wasted slots" are inherent to R_StrictPriorityScheduling. The important characteristic for our workload is to respect the priorities, not to use every single available slot as soon as possible.
The client thinks that, in the above case, preemption might not help under the regular R_PriorityScheduling policy.
Please let us know if this one could be a RFE and gets reviewed...
Regards
Andrew Wang