GPU partition

The GPU partition consists of 29x HPE Cray EX235n Compute Blade, each blade holding 2 nodes and 4x Nvidia A100 40 GB GPUs per node. The GPU partition contains a total of 58 nodes and 232 GPUs. The partition name in the system is gpu.

The Komondor GPU partition compute nodes are each equipped with one AMD EPYC 7763 processor. These CPUs feature 64 cores each, operating at 2.45 GHz. The processors support 2-way simultaneous multithreading, enabling up to 128 threads per node.

GPUs in GPU partition

The GPU partition features Nvidia A100 40 GB GPUs. These GPUs are based on the Ampere microatchitecture. An A100 GPU consists of multiple GPU Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and HBM2 memory controllers:

7 GPCs, 7 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 108 SMs/GPU
64 FP32 CUDA Cores/SM, 6912 FP32 CUDA Cores/GPU
4 Third-generation Tensor Cores/SM, 432 Third-generation Tensor Cores /GPU
5 HBM2 stacks

Memory in GPU partition

Each node of the Komondor CPU partition is equipped with 8x DDR4 3200MHz 16GB DIMM module per socket. This means 256 GB memory per node.

Network in GPU partition

The Komondor CPU partition nodes each have a single 200 Gb/s interface to the HPE Slingshot 200GbE interconnect.

Naming convention in GPU partition

CPU nodes are located in the x1001 cabinet. For example: x1000c0s0b0n0

c - Chassis (0-7)
s - Slot (0-7)
b - Board (Blade) (0-1)
n - Node (0-1)