Partitions

Falcon cluster provides different CPU and GPU architectures. Also, there are different queues for different job priorities and limitations.

We use the following control techniques for specifying queues and resources:

  1. Partitions
  2. Quality of Service (QoS)

At present there are 3 partitions, peregrine-cpu is the default:

Partition Job Type CPU / GPU CPU Memory / GPU memory
peregrine-cpu single and multi-core 1 TB / NA
peregrine-gpu GPU 1 TB / 640 GB
kestrel-gpu GPU 360 GB / 288 GB
QoS

We use QoS to classify different jobs based on limits. In each partition there is a default QoS (bold below). Each QoS has specific limits:

Partition QoS Time Limit
peregrine-cpu cpu_debug 30 min
peregrine-cpu cpu_short 24 hours
peregrine-cpu cpu_medium 3 days
peregrine-cpu cpu_long 10 days
[peregrine,kestrel]-gpu gpu_debug 30 min
[peregrine,kestrel]-gpu gpu_short 24 hours
[peregrine,kestrel]-gpu gpu_medium 3 days
[peregrine,kestrel]-gpu gpu_long 10 days