The job scheduler (SLURM) on Falcon assigns a priority to each job in order to determine which jobs to schedule and when. Priority determines a job’s position in the pending queue relative to other jobs and the order in which the pending jobs will run. This is an integer value which is calculated based on a number of factors.
Priority Calculation
On the Falcon cluster, the job priority is calculated as a weighted sum of the following factors:
- Age: Length of time your job has been pending in the queue, eligible to be scheduled. The job priority increases with job’s age.
- Size: The size of your job in terms of resources requested (CPU,Memory,GPUs). At present, this is not used to calculate priority.
- QoS: Priority based on job’s requested run-time: debug, short, medium or long. At present, this is not used to calculate priority.
- Fairshare: Based on your historical usage (explained in the next section). Job priority decreases with the increase in resource usage.
Fair Share
We use the concept of “fair-share” to promote a balanced resource usage among users. The scheduler deprioritizes users with excessive resource utilization. It makes sure that users who haven’t used the cluster as much get higher priority for their jobs, while users who have used the cluster a lot don’t overuse it.
A fractional number between 0 to 1, is assigned to all users based on their past usage. This number keeps changing based on your usage and on the total number of users in the system. Job priority calculation uses this number as one of the factors.
If you’ve been using a lot of resources, your fair-share number will keep going down, and the job priority for your subsequent jobs will drop.