OpenMP
OpenMP is a parallel programming model that is portable across shared memory architectures from Cray and other vendors.
CCE supports full OpenMP 5.0 and partial OpenMP 5.1 and 5.2.
More information: https://www.openmp.org/ https://cpe.ext.hpe.com/docs/guides/CCE/topics/fortran_new/OpenMP_Overview.html
OpenMP Example
The following simple program includes a standard OpenMP directive. For the OpenMP application we can use the functions from the omp.h header file.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
// Beginning of parallel region
#pragma omp parallel
{
printf("Hello World... from thread = %d\n",
omp_get_thread_num());
}
// Ending of parallel region
}
Introduction to OpenMP: https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html
OpenMP CPU
By default, OpenMP is disabled in CCE and must be explicitly enabled using the -fopenmp compiler command line option. The application can be compiled either with the CCE or with the GNU compiler.
Compiling the code with the CCE compiler:
cc openmp.c -fopenmp
The following executable is dymanically linked with the CEE compiler:
ldd a.out |grep mp
libcraymp.so.1 => /opt/cray/pe/cce/16.0.1/cce/x86_64/lib/libcraymp.so.1 (0x00007f1da924f000)
The GNU compiler can be loaded by switching the module from PrgEnv-cray to PrgEnv-gnu.
module swap PrgEnv-cray PrgEnv-gnu
cc openmp.c -fopenmp
The following executable is dymanically linked with the GNU compiler:
ldd a.out |grep mp
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7576184000)
OpenMP GPU
In order to provide GPU offload with the CCE compiler, the appropriate GPU acceleration module for the Nvidia A100 card must be loaded.
module load craype-accel-nvidia80
export CRAY_ACCEL_TARGET=nvidia80
cc openmp.c -fopenmp
The following executable is dymanically linked with the CEE compiler:
$ ldd a.out |grep mp
libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x00007f2f904cb000)
libcraymp.so.1 => /opt/cray/pe/cce/16.0.1/cce/x86_64/lib/libcraymp.so.1 (0x00007f2f8f8e7000)
The Nvidia HPC SDK toolkit can also be used for GPU offload.
module swap PrgEnv-cray PrgEnv-gnu
module load nvhpc
cc openmp.c -fopenmp
The following executable is dymanically linked with the Nvidia HPC SDK:
$ ldd a.out |grep nv
libgomp.so.1 => /opt/software/packages/nvhpc/Linux_x86_64/23.11/compilers/lib/libgomp.so.1 (0x00007ff3e423e000)
OpenMP CPU Batch Job
A maximum of 1 node can be reserved for standard OpenMP parallel applications. The number of OMP threads must be specified with the OMP_NUM_THREADS environment variable. The environment variable must either be written before the application or must be exported.
In the following example, we assigned 16 CPU cores to a task. The SLURM_CPUS_PER_TASK variable contains the number of CPU cores, and this will also set the number of OMP threads.
#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=cpu
#SBATCH --job-name=openmp-cpu
#SBATCH --output=openmp-cpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./openmp-cpu
OpenMP GPU Batch Job
In case of GPU offload, the OMP_TARGET_OFFLOAD environment variable must be set to MANDATORY.
#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=gpu
#SBATCH --job-name=openmp-gpu
#SBATCH --output=openmp-gpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
OMP_TARGET_OFFLOAD=MANDATORY ./openmp-gpu