One can combine multithreading and multinode parallelism using a hybrid OpenMP/MPI approach. Let use the following C++ code, which uses both MPI and OMP:
#include <iostream>
#include <mpi.h>
#include <omp.h>
int main(int argc, char** argv) {
using namespace std;
MPI_Init(&argc, &argv);
int world_size, world_rank;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
#pragma omp parallel
{
int id = omp_get_thread_num();
int nthrds = omp_get_num_threads();
cout << "Hello from thread " << id << " of " << nthrds
<< " on MPI process " << world_rank << " of " << world_size
<< " on node " << processor_name << endl;
}
MPI_Finalize();
return 0;
}
Save it as hybrid.cpp
and compile it via the command
module load compilers/mpi/openmpi-slurm
mpicxx -fopenmp -o hybrid hybrid.cpp
Below is a SLURM job script for our code:
#!/bin/bash
#
#SBATCH --job-name="Hybrid Demo" # a name for your job
#SBATCH --partition=peregrine-cpu # partition to which job should be submitted
#SBATCH --qos=cpu_debug # qos type
#SBATCH --nodes=2 # node count
#SBATCH --ntasks-per-node=2 # total number of tasks per node
#SBATCH --cpus-per-task=4 # cpu-cores per task
#SBATCH --mem-per-cpu=1G # memory per cpu-core
#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module purge
module load compilers/mpi/openmpi-slurm
srun ./hybrid
Notice how we ask for two nodes, and 2 tasks per node, and 4 cpus-per-task.
Our code will be running over two nodes now.
Save the script as hybrid.sh
and submit it as
sbatch hybrid.sh
The result will be saved in a file named slurm-####.out
and should look like
Hello from thread Hello from thread 0 of 4 on MPI process 1 of 4 on node peregrine03 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread 2 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread Hello from thread 3 of 4 on MPI process 3 of 4 on node peregrine1Hello from thread 1 of 4 on MPI process 3 of 4 on node 2peregrine1 of 4 on MPI process 3 of 4 on node peregrine1
Hello from thread 1 of 4 on MPI process 0 of 4 on node peregrine0
Hello from thread 1 of 4 on MPI process 1 of 4 on node peregrine0
Hello from thread 2 of 4 on MPI process 0 of Hello from thread 4 on node peregrine0
Hello from thread 0 of 4 on MPI process 0 of 4 on node peregrine0
3 of 4 on MPI process 0 of 4 on node peregrine0
Hello from thread Hello from thread 3 of 4 on MPI process 2 of 4 on node peregrine11 of 4 on MPI process 2 of 4 on node peregrine1
Hello from thread 2 of 4 on MPI process 2 of 4 on node peregrine1
Hello from thread 0 of 4 on MPI process 3 of 4 on node peregrine1
Hello from thread 0 of 4 on MPI process 2 of 4 on node peregrine1