Overview

Login Node

When you login to Komondor, you arrive at the login node (v01). Apart from being the gate to the supercomputer system, the login node can be used to manage your projects on the supercomputer. Management tasks include:

  • uploading programs and data to the storage;

  • compiling and installing software;

  • preparing your application to be run on the supercomputer;

  • submitting computation jobs to the compute nodes using the Slurm scheduler;

  • monitoring and handling submitted jobs;

  • checking job efficiency to improve subsequent jobs;

  • arranging, downloading and backuping computation results.

Important

Though small management tasks that don’t require large resources can be executed on the login node, resource-intensive computation tasks are subject to interruption without prior notice.

Compute Nodes

Actual computation tasks must be run on the compute nodes. These are the nodes that provide the computing power of the supercomputer. According to the amount and type of resources, there are four different types of compute nodes in Komondor, arranged in four units called partitions (or queues (using terminology of other queuing systems)). Partitions serve different type of computing requirements.

Job Queues (Partitions)

Four, non-overlapping job queues (partitions) are available on the Komondor supercomputer: the “CPU”, the “GPU”, the “AI” and the “BigData”. There is no testing partition, all four partitions are for actual computations.

Partition

Compute nodes

CPUs / node

CPU cores / node

GPUs / node

Memory / node

“CPU” (default)

184 x

2 AMD CPUs

128 CPU cores

N/A

256 GB RAM

“GPU”

58 x

1 AMD CPU

64 CPU cores

4 A100 GPUs

256 GB RAM

“AI”

4 x

2 AMD CPUs

128 CPU cores

8 A100 GPUs

512 GB RAM

“BigData”

1 x

16 Intel CPUs

288 CPU cores

N/A

12 TB RAM

You can instruct the Slurm scheduler to allocate the necessary resources and launch your tasks on the compute nodes using Slurm commands and special directives in your job script. Slurm queues all submitted jobs (from all users) according to their calculated priority and starts them on a schedule based on available resources.