GPU Jobs

GPUs are available on the peregrine and kestrel nodes through the peregrine-gpu and kestrel-gpu partitions. There are three types of GPUs on these nodes:

peregrine-gpu

  • Nvidia A100 80GB - 6 available
  • Nvidia A100 40GB - 4 available

kestrel-gpu

  • Nvidia GeForce RTX 3090 24GB - 12 available

How to use GPUs

To use GPUs in your SLURM job:

  • Add an additional SBATCH statement: #SBATCH --gres=gpu:<type>:<number_of_gpus> to your job script.

    • For A100 80GB, use
      #SBATCH --gres=gpu:a100-sxm4-80gb:1
      
    • For A100 40GB, use
      #SBATCH --gres=gpu:nvidia_a100_3g.39gb:1
      
    • For RTX 3090 24GB, use
      #SBATCH --gres=gpu:3090:1
      
  • Submit to the peregrine-gpu partition for A100s or kestrel-gpu partition for the 3090s.

  • Note that the number at the end of the SBATCH statement is the quantity of GPUs. In the statements above, we have requested for 1 GPU.

Adding the --gres option to a Slurm script for a CPU-only code WILL NOT magically speed-up your code.
Only software/code that has been explicitly written to run on GPUs can benefit from GPUs.
Requesting a GPU for a CPU-only code will waste resources and might as well lower down the priority of your future jobs.

The GPU type must be specified in the SLURM script.
It is not possible to mix and match GPU types in a single job.

Do not ask for multiple GPUs if your codes is only written to use a single GPU.
Doing so will waste resources and might as well lower down the priority of your future jobs.

CPU-GPU ratio

On the peregrine nodes, the ratio of CPUs to GPUs is 6:1. So, your job can request 6 CPU cores for 1 GPU.

Monitor GPU Usage

After you submit your GPU job via sbatch command, you can monitor the GPU usage to check the memory usage of one or more GPUs in your job. Use the following command to get GPU usage of your job:

sgpu <your-jobid-here>

The above command runs within your job’s resource allocation. Though the resources required for this task are not too high, and should not impact your job performance, it is recommened to use this on an “as needed” basis, and not in a script which runs it in a loop.