AI partition

The AI partition consists of 4x HPE Apollo 6500 Gen10Plus Blade, each blade holding 1 node and 8x Nvidia A100 SXM4 40 GB GPUs per node. The AI partition contains a total of 4 nodes and 32 GPUs. The partition name in the system is ai.

The Komondor AI partition compute nodes are each equipped with two AMD EPYC 7763 processors. These CPUs feature 64 cores each, operating at 2.45 GHz, resulting in a total of 128 cores per node. The processors support 2-way simultaneous multithreading, enabling up to 256 threads per node.

GPUs in the AI partition

The AI partition features Nvidia A100 40 GB GPUs. These GPUs are based on the Ampere microatchitecture. An A100 GPU consists of multiple GPU Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and HBM2 memory controllers:

7 GPCs, 7 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 108 SMs/GPU
64 FP32 CUDA Cores/SM, 6912 FP32 CUDA Cores/GPU
4 Third-generation Tensor Cores/SM, 432 Third-generation Tensor Cores /GPU
5 HBM2 stacks

Memory in AI nodes

Each node of the Komondor CPU partition is equipped with 8x DDR4 3200MHz 16GB DIMM module per socket. This means 256 GB memory per node.

Network in AI nodes

The Komondor CPU partition nodes each have a single 200 Gb/s interface to the HPE Slingshot 200GbE interconnect.

Naming convention in AI partition

For example: cn01

c - Chassis
n - Node (01-04)