Falcon cluster provides different CPU and GPU architectures. Also, there are different queues for different job priorities and limitations.
We use the following control techniques for specifying queues and resources:
- Partitions
- Quality of Service (QoS)
At present there are 3 partitions, peregrine-cpu is the default:
Partition | Job Type CPU / GPU | CPU Memory / GPU memory |
---|---|---|
peregrine-cpu | single and multi-core | 1 TB / NA |
peregrine-gpu | GPU | 1 TB / 640 GB |
kestrel-gpu | GPU | 360 GB / 288 GB |
QoS
We use QoS to classify different jobs based on limits. In each partition there is a default QoS (bold below). Each QoS has specific limits:
Partition | QoS | Time Limit |
---|---|---|
peregrine-cpu | cpu_debug | 30 min |
peregrine-cpu | cpu_short | 24 hours |
peregrine-cpu | cpu_medium | 3 days |
peregrine-cpu | cpu_long | 10 days |
[peregrine,kestrel]-gpu | gpu_debug | 30 min |
[peregrine,kestrel]-gpu | gpu_short | 24 hours |
[peregrine,kestrel]-gpu | gpu_medium | 3 days |
[peregrine,kestrel]-gpu | gpu_long | 10 days |