The scheduler on Condo is torque which is a follow-on of OpenPBS (Open Portable Batch System). This is an Open Source product. The scheduling of jobs runs through Torque and Maui and a local metascheduler which allows for differentation of scheduling priority for jobs from groups which have different account balances, in particular, goving more resources to groups which have a positiove account balance versus a negative account balance. A user can see his/her account balance by issuing the command: /usr/local/bin/account_balance Details of how this is calculated can be found at:
Allocation Policies For groups with a negative balance
Queues Limits (Limits for groups with Negative Account Balance in red)
Positive Group Balance Negative Group Balance Queue Name Max Time/Job (hours) Max Nodes/Job Max Nodes/Queue Max Jobs/Queue Max Jobs/User Max Jobs/Group Max Nodes/Queue Max Jobs/Queue Max Jobs/User Max Jobs/Group long_large 168 48 48 3 1 1 32 1 1 1 long_medium 168 8 16 3 1 2 8 2 1 2 long_1node 504 1 8 8 3 6 4 2 1 1 short_large 1 48 48 1 2 3 48 1 1 1 short_medium 24 8 24 4 1 4 12 2 1 1 short_1node 96 1 8 6 4 6 4 2 1 1 small 4 8 32 6 2 4 16 8 1 1
To maximize system utilitization, when the system is relatively idle, short duration jobs are allowed to run even beyond the above limits. For now, this is accomplished by the following. Define High Priority Groups (HPGs) as those with a positive allocation. Every 5 minutes (each 5 minutes), the following checks are run, and one job per user may be started under the following rules: IF jobs from HPGs are waiting for compute ndoes THEN following jobs are run: jobs which can run which leave at least 32 nodes free ELSE (no HPGs are waiting)` jobs of 2 hours or less which can run and leave at least 46 nodes free jobs of 24 hours or less which can run and leave at least 70 nodes free jobs of 48 hours or less owhich can run and leave at least 132 nodes free ENDIF After those checks, we also check if there are any jobs waiting in the short_large queue iif there are not, then` jobs of 2 hours or less which can run and leave at least 36 nodes free jobs of 24 hours or less which can run and leave at least 116 nodes free ENDIF These rules do get tweaked as the overall load changes.