Overview
Login Node
When you login to Komondor, you arrive at the login node (v01). Apart from being the gate to the supercomputer system, the login node can be used to manage your projects on the supercomputer. Management tasks include:
uploading programs and data to the storage;
compiling and installing software;
preparing your application to be run on the supercomputer;
submitting computation jobs to the compute nodes using the Slurm scheduler;
monitoring and handling submitted jobs;
checking job efficiency to improve subsequent jobs;
arranging, downloading and backuping computation results.
Important
Though small management tasks that don’t require large resources can be executed on the login node, resource-intensive computation tasks are subject to interruption without prior notice.
Compute Nodes
Actual computation tasks must be run on the compute nodes. These are the nodes that provide the computing power of the supercomputer. According to the amount and type of resources, there are four different types of compute nodes in Komondor, arranged in four units called partitions (or queues (using terminology of other queuing systems)). Partitions serve different type of computing requirements.
Job Queues (Partitions)
Four, non-overlapping job queues (partitions) are available on the Komondor supercomputer: the “CPU”, the “GPU”, the “AI” and the “BigData”. There is no testing partition, all four partitions are for actual computations.
Partition |
Compute nodes |
CPUs / node |
CPU cores / node |
GPUs / node |
Memory / node |
---|---|---|---|---|---|
“CPU” (default) |
184 x |
2 AMD CPUs |
128 CPU cores |
N/A |
256 GB RAM |
“GPU” |
58 x |
1 AMD CPU |
64 CPU cores |
4 A100 GPUs |
256 GB RAM |
“AI” |
4 x |
2 AMD CPUs |
128 CPU cores |
8 A100 GPUs |
512 GB RAM |
“BigData” |
1 x |
16 Intel CPUs |
288 CPU cores |
N/A |
12 TB RAM |
You can instruct the Slurm scheduler to allocate the necessary resources and launch your tasks on the compute nodes using Slurm commands and special directives in your job script. Slurm queues all submitted jobs (from all users) according to their calculated priority and starts them on a schedule based on available resources.