Implement Starving Jobs
Track the amount of time a job has been waiting to run and then mark the job as starving if this time has passed a specified limit.
Starving Jobs
Overview of starving jobs including parameters.
PBS can keep track of the amount of time a job has been waiting to run, and then mark the job as starving if this time has passed a specified limit. You can use this starving status in calculating both execution and preemption priority.
You enable tracking whether jobs are starving by enabling the Help Starving Jobs parameter. It is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time
You specify the amount of time required for a job to be considered starving in the Job Starving Time parameter. The default for this parameter is 24 hours.
PBS can use one of the following kinds of time to determine whether a job is starving:
- The job’s eligible wait time.
- The amount of time the job has been queued.
Starving Jobs Parameters
- Help Starving Jobs
- Setting this option enables starving job support. Once jobs have waited for the amount of time given by Job Starving Time they are considered starving. If a job is considered starving, no lower-priority jobs will run until the starving job can be run, unless backfilling is also specified. To use this option, the Job Starving Time parameter needs to be set as well.
- Job Starving Time
- The amount of time before a job is considered starving. This variable is used only if Help Starving Jobs is enabled.
- Jobs Starve by Eligible Time
-
Controls starving behavior. When enabled, each job’s eligible time value is used as its wait time for starving. If disabled, the amount of time the job has been queued is used as its wait time for starving.
Using Job’s Eligible Wait Time to Determine a Job is Starving
PBS provides a method for tracking how long a job that is eligible to run has been waiting to run. By “eligible to run”, we mean that the job could run if the required resources were available. The time that a job waits while it is not running can be classified as “eligible” or “ineligible”. Roughly speaking, a job accrues eligible wait time when it is blocked due to a resource shortage, and accrues ineligible wait time when it is blocked due to project, user, or group limits.
PBS can use the job's eligible wait time to determine whether the job is starving. A starving job is one that's wait time has exceeded a configurable maximum. PBS can keep track of the amount of time a job has been waiting to run, and then mark the job as starving if this time has passed the maximum limit. You can use this starving status in calculating both execution and preemption priority.
When Jobs Starve by Eligible Time is enabled, each job’s eligible time value is used as its wait time for starving. If Jobs Starve by Eligible Time is disabled, the amount of time the job has been queued is used as its wait time for starving.
- The amount of time the job has been queued is used as its wait time for starving.
- Jobs lose their queue wait time whenever they are requeued, as with the qrerun command. This includes when they are checkpointed or requeued (but not suspended) during preemption.
- Suspended jobs do not lose their queue wait time. However, when they become suspended, the amount of time since they were submitted is counted towards their queue wait time. For example, if a job was submitted, then remained queued for 1 hour, then ran for 26 hours, then was suspended, if Job Starving Time is 24 hours, then the job will become starving.
- The job’s eligible time value is used as its wait time for starving.
- Jobs do not lose their eligible time when they are requeued.
- Jobs do not lose their eligible time when they are suspended.
Enable Starving Jobs
Enable tracking whether jobs are starving.