GPUs are available on the peregrine and kestrel nodes through the peregrine-gpu
and kestrel-gpu
partitions.
There are three types of GPUs on these nodes:
peregrine-gpu
- Nvidia A100 80GB - 6 available
- Nvidia A100 40GB - 4 available
kestrel-gpu
- Nvidia GeForce RTX 3090 24GB - 12 available
How to use GPUs
To use GPUs in your SLURM job:
-
Add an additional SBATCH statement:
#SBATCH --gres=gpu:<type>:<number_of_gpus>
to your job script.- For A100 80GB, use
#SBATCH --gres=gpu:a100-sxm4-80gb:1
- For A100 40GB, use
#SBATCH --gres=gpu:nvidia_a100_3g.39gb:1
- For RTX 3090 24GB, use
#SBATCH --gres=gpu:3090:1
- For A100 80GB, use
-
Submit to the
peregrine-gpu
partition for A100s orkestrel-gpu
partition for the 3090s. -
Note that the number at the end of the SBATCH statement is the quantity of GPUs. In the statements above, we have requested for 1 GPU.
Adding the --gres
option to a Slurm script for a CPU-only code WILL NOT magically speed-up your code.
Only software/code that has been explicitly written to run on GPUs can benefit from GPUs.
Requesting a GPU for a CPU-only code will waste resources and might as well lower down the priority of your future jobs.
The GPU type must be specified in the SLURM script.
It is not possible to mix and match GPU types in a single job.
Do not ask for multiple GPUs if your codes is only written to use a single GPU.
Doing so will waste resources and might as well lower down the priority of your future jobs.
CPU-GPU ratio
On the peregrine nodes, the ratio of CPUs to GPUs is 6:1. So, your job can request 6 CPU cores for 1 GPU.
Monitor GPU Usage
After you submit your GPU job via sbatch
command, you can monitor the GPU usage to check the memory usage of one or more GPUs in your job.
Use the following command to get GPU usage of your job:
sgpu <your-jobid-here>
The above command runs within your job’s resource allocation. Though the resources required for this task are not too high, and should not impact your job performance, it is recommened to use this on an “as needed” basis, and not in a script which runs it in a loop.