In this example, we’ll use the PyTorch MNSIT example.
Get the source code from
and save the Python code as
We will now use the following SLURM script
to run the code:
#SBATCH --job-name="PyTorch-GPU-Demo" # job name
#SBATCH --partition=peregrine-gpu # partition to which job should be submitted
#SBATCH --qos=gpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task
#SBATCH --mem=4G # total memory per node
#SBATCH --gres=gpu:nvidia_a100_3g.39gb:1 # Request 1 GPU (A100 40GB)
#SBATCH --time=00:05:00 # wall time
module purge
module load python/anaconda
python --epochs=3
Submit the job as
The result will be saved in a file named slurm-####.out
and should look like
Train Epoch: 1 [0/60000 (0%)] Loss: 2.299824
Train Epoch: 1 [640/60000 (1%)] Loss: 1.733667
Train Epoch: 1 [1280/60000 (2%)] Loss: 0.933156
Train Epoch: 1 [1920/60000 (3%)] Loss: 0.623502
Train Epoch: 1 [2560/60000 (4%)] Loss: 0.357575
Train Epoch: 1 [3200/60000 (5%)] Loss: 0.315663
Train Epoch: 3 [55680/60000 (93%)] Loss: 0.009016
Train Epoch: 3 [56320/60000 (94%)] Loss: 0.241464
Train Epoch: 3 [56960/60000 (95%)] Loss: 0.004863
Train Epoch: 3 [57600/60000 (96%)] Loss: 0.004337
Train Epoch: 3 [58240/60000 (97%)] Loss: 0.109445
Train Epoch: 3 [58880/60000 (98%)] Loss: 0.038164
Train Epoch: 3 [59520/60000 (99%)] Loss: 0.014446
Test set: Average loss: 0.0333, Accuracy: 9887/10000 (99%)