Hybrid MPI
Hybrid MPI parallel applications use both MPICH and OpenMP. The MPI data transfer mechanism can be combined with the OpenMP shared memory access methods. In the case of hybrid linking, applications can consume less memory in MPI workloads.
Hybrid Example
The example below shows the structure of a hybrid MPI program. The application uses functions both from mpi.h and omp.h header files and combines both methods in order to achieve the best possible performance.
#include <stdio.h>
#include <mpi.h>
#include <omp.h>
int main(int argc, char *argv[])
{
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
int iam = 0, np = 1;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
#pragma omp parallel default(shared) private(iam, np)
{
np = omp_get_num_threads();
iam = omp_get_thread_num();
printf("Hello from thread %d out of %d from process %d out of %d on %s\n",
iam, np, rank, numprocs, processor_name);
}
MPI_Finalize();
}
Hybrid OpenMP presentation: https://www.openmp.org/wp-content/uploads/HybridPP_Slides.pdf
Hybrid MPI CPU
Compiling the application for CPU with the CCE compiler:
cc hybrid.c -fopenmp
The following executable is dymanically linked with the CEE compiler:
$ ldd a.out |grep mp
libmpi_cray.so.12 => /opt/cray/pe/lib64/libmpi_cray.so.12 (0x00007fa7b7e6a000)
libcraymp.so.1 => /opt/cray/pe/cce/16.0.1/cce/x86_64/lib/libcraymp.so.1 (0x00007fa7b7286000)
Compiling the application for CPU with the GNU compiler:
module swap PrgEnv-cray PrgEnv-gnu
cc hybrid.c -fopenmp
The following executable is dymanically linked with the GNU compiler:
$ ldd a.out | grep mp
libmpi_gnu_103.so.12 => /opt/cray/pe/lib64/libmpi_gnu_103.so.12 (0x00007f11b0e54000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f11b0c1c000)
Hybrid MPI GPU
Compiling the application for GPU usage with the CCE compiler:
module load craype-accel-nvidia80
export CRAY_ACCEL_TARGET=nvidia80
cc hybrid.c -fopenmp
The following executable is dymanically linked with the CEE compiler:
$ ldd a.out | grep mp
libmpi_gnu_103.so.12 => /opt/cray/pe/lib64/libmpi_gnu_103.so.12 (0x00007f350e4d5000)
libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x00007f350e28f000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f350e057000)
Compiling the application for GPU usage with the GNU compiler using the Nvidia HPC SDK:
module swap PrgEnv-cray PrgEnv-gnu
module load nvhpc
cc hybrid.c -fopenmp
The following executable is dymanically linked with the GNU compiler:
$ ldd a.out | grep mp
libmpi_gnu_103.so.12 => /opt/cray/pe/lib64/libmpi_gnu_103.so.12 (0x00007f4ae6c22000)
libgomp.so.1 => /opt/software/packages/nvhpc/Linux_x86_64/23.11/compilers/lib/libgomp.so.1 (0x00007f4ae5c21000)
Hybrid MPI Nvidia
Compiling the application for GPU usage with the Nvidia compiler:
module swap PrgEnv-cray PrgEnv-nvhpc
cc hybrid.c -mp=gpu -gpu=cc80
The following executable is dymanically linked with the Nvidia compiler:
$ ldd a.out | grep libmp
libmpi_nvidia.so.12 => /opt/cray/pe/lib64/libmpi_nvidia.so.12 (0x00007f6cce0c8000)
Hybrid MPI CPU Batch Job
We sent the batch job to run the application using 4 tasks and 16 CPUs per each.
#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=cpu
#SBATCH --job-name=hybrid-cpu
#SBATCH --output=hybrid-cpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./hybrid-cpu
Hybrid MPI GPU Batch Job
In case GPU offload the OMP_TARGET_OFFLOAD environment variable must be set to MANDATORY.
#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=gpu
#SBATCH --job-name=hybrid-gpu
#SBATCH --output=hybrid-gpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
export OMP_TARGET_OFFLOAD=MANDATORY
srun ./hybrid-gpu