Certain applications require direct user input via a terminal. For these one can make use of interactive jobs in SLURM, which makes it possible to run applications/commands on compute nodes in a shell.
SLURM offers two ways in which one can run interactive jobs: using the srun
command and salloc
command.
Interactive jobs are intended for very short running and very specific applications/commands.
Do not use interactive jobs for long term jobs and for regular applications.
Please use the sbatch
command to submit jobs.
SRUN
Using the srun
command, interactive jobs can be run within a scheduled shell.
Here is an example:
srun --nodes=1 --ntasks=1 --mem=4G --time=00:05:00 --pty /bin/bash
Notice how the prompt changes indicating that a new shell has been spawned on one of the compute nodes:
peregrine0:~$
You can now run your interactive application/command and after you are done, just type exit
at the command prompt to quit the shell and delete the SLURM job.
SALLOC
For situations where you would like to come back to your interactive session (after disconnecting from it), you can use SLURM’s salloc
command to allocate resource up-front and keep the job running.
The process looks like this:
- Use
salloc
to create the resource allocation up front - Use srun to connect to it, as many times as needed during the job time frame.
Run the command below to allocate resources:
salloc --nodes=1 --ntasks=1 --mem=4G --time=00:20:00
Here we are allocating 4GB of memory and one CPU on a node for 20 minutes. The command will display a job id number. Keep a note of it, as you will need that to connect to the interactive shell.
salloc: Granted job allocation 235
salloc: Waiting for resource configuration
salloc: Nodes peregrine0 are ready for job
Notice this time the prompt did not change.
Since salloc
only allocates resources to your job, it does not start a shell.
To connect to an interactive shell on your job use the srun
command and specify the job id (which you noted in the salloc
step):
srun --jobid=235 --pty /bin/bash
You will now be landed on a compute node in an interactive shell.
peregrine0:~$
Now you can exit from the shell and connect again later, using the srun
command with the same job id number.
To finally delete your job, use the scancel
command.
scancel 235
salloc: Job allocation 235 has been revoked.
Hangup