OpenMP

OpenMP is a parallel programming model that is portable across shared memory architectures from Cray and other vendors.

CCE supports full OpenMP 5.0 and partial OpenMP 5.1 and 5.2.

More information: https://www.openmp.org/ https://cpe.ext.hpe.com/docs/guides/CCE/topics/fortran_new/OpenMP_Overview.html

OpenMP Example

The following simple program includes a standard OpenMP directive. For the OpenMP application we can use the functions from the omp.h header file.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
    // Beginning of parallel region
    #pragma omp parallel
    {
        printf("Hello World... from thread = %d\n",
               omp_get_thread_num());
    }
    // Ending of parallel region
}

Introduction to OpenMP: https://cpe.ext.hpe.com/docs/cce/man7/intro_openmp.7.html

OpenMP CPU

By default, OpenMP is disabled in CCE and must be explicitly enabled using the -fopenmp compiler command line option. The application can be compiled either with the CCE or with the GNU compiler.

Compiling the code with the CCE compiler:

cc openmp.c -fopenmp

The following executable is dymanically linked with the CEE compiler:

ldd a.out |grep mp
     libcraymp.so.1 => /opt/cray/pe/cce/16.0.1/cce/x86_64/lib/libcraymp.so.1 (0x00007f1da924f000)

The GNU compiler can be loaded by switching the module from PrgEnv-cray to PrgEnv-gnu.

module swap PrgEnv-cray PrgEnv-gnu
cc openmp.c -fopenmp

The following executable is dymanically linked with the GNU compiler:

ldd a.out |grep mp
     libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f7576184000)

OpenMP GPU

In order to provide GPU offload with the CCE compiler, the appropriate GPU acceleration module for the Nvidia A100 card must be loaded.

module load craype-accel-nvidia80
export CRAY_ACCEL_TARGET=nvidia80
cc openmp.c -fopenmp

The following executable is dymanically linked with the CEE compiler:

$ ldd a.out |grep mp
     libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x00007f2f904cb000)
     libcraymp.so.1 => /opt/cray/pe/cce/16.0.1/cce/x86_64/lib/libcraymp.so.1 (0x00007f2f8f8e7000)

The Nvidia HPC SDK toolkit can also be used for GPU offload.

module swap PrgEnv-cray PrgEnv-gnu
module load nvhpc
cc openmp.c -fopenmp

The following executable is dymanically linked with the Nvidia HPC SDK:

$ ldd a.out |grep nv
     libgomp.so.1 => /opt/software/packages/nvhpc/Linux_x86_64/23.11/compilers/lib/libgomp.so.1 (0x00007ff3e423e000)

OpenMP CPU Batch Job

A maximum of 1 node can be reserved for standard OpenMP parallel applications. The number of OMP threads must be specified with the OMP_NUM_THREADS environment variable. The environment variable must either be written before the application or must be exported.

In the following example, we assigned 16 CPU cores to a task. The SLURM_CPUS_PER_TASK variable contains the number of CPU cores, and this will also set the number of OMP threads.

#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=cpu
#SBATCH --job-name=openmp-cpu
#SBATCH --output=openmp-cpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./openmp-cpu

OpenMP GPU Batch Job

In case of GPU offload, the OMP_TARGET_OFFLOAD environment variable must be set to MANDATORY.

#!/bin/bash
#SBATCH -A hpcteszt
#SBATCH --partition=gpu
#SBATCH --job-name=openmp-gpu
#SBATCH --output=openmp-gpu.out
#SBATCH --time=06:00:00
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
OMP_TARGET_OFFLOAD=MANDATORY ./openmp-gpu