Anatomy of a Job script
A job script commonly consists of two parts:
- Scheduler specific options to manage the resources configure the job environment
- Job-specific shell commands (configuring software environment, specifying your binary/executable)
Here is a simple example of how a job script looks like:
#!/bin/bash
#
# Scheduler specific section
# --------------------------
#SBATCH --job-name="Hello World" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=1 # node count
#SBATCH --ntasks=1 # total number of tasks across all nodes
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=2G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
#SBATCH --output=my-job.out # output log file
#SBATCH --error=my-job.err # error file
#SBATCH --mail-type=begin # send email when job begins
#SBATCH --mail-type=end # send email when job ends
#SBATCH --mail-user=<First.Last>@colostate.edu
#
# Job specific section
# -----------------------
module load python/anaconda
srun python3 hello_world.py
The first line of a Slurm script specifies the Unix shell to be used.
Then in the scheduler section, one specifies a series of #SBATCH directives which set the resource requirements and other parameters of the job.
The above example is a short running CPU job, as it is submitted to the peregrine-cpu
partition and the qos requested is cpu_debug
.
Specifying a QoS is mandatory.
The script above requests 1 CPU-core (via --cpus-per-task=1
) and 2 GB of memory (--mem-per-cpu=2G
) and a wall time of 1 minute (--time=00:01:00
).
Then we specify where the output and error messages get written, using the --output=
and --error=
. In this example, we are writing the output to a file named my-job.out
and the error messages to my-job.err
. If you do not specify the output files, SLURM writes both stdout and stderr to a file named slurm-XXXX.out
, where XXXX is the job id.
You can specify an email address (--mail-user=
) and the events for which you would like to be notified.
In the above example, we have specified to alert us at the begining and the end of the job.
In the second section, we have the job specific commands.
Any environment modules that are needed for utilizing software should be loaded at this stage.
Just like on other CS machines, we use environment modules to use software available under /usr/local
.
More information on using environment modules can be found on the Environment Modules page.
Here we are loading the Anaconda Python module, to make use of Python.
Note that you may use other modules which provide Python as well.
And finally, the actual work to be done, which in this example is the execution of a Python code, is specified in the final line.
The executable (here the Python interpreter) is usually called using the srun
slurm command.
Samples
The sub-sections provide example scripts for the following types of jobs:
- Serial Jobs
- Multithreaded Jobs
- MPI Jobs
- Hybrid Jobs
- GPU Jobs
- Interactive Jobs