Preparing Job Scripts (sbatch)
#SBATCH Directives
Applications have to be run in batch mode on the supercomputer.
This means that a job script must be prepared for each run,
which contains the description of the required resources and the commands required for the run.
The script can be submitted
with the sbatch
command to the Slurm scheduler.
Parameters for the scheduler (resource requirements) can be provided
at the top of the script using the #SBATCH
directive in the script.
(Note, that sbatch will stop processing further sbatch
directives once the first non-comment non-whitespace line has been reached in the script.)
Basic Options
The most basic options that can be provided using sbatch directives are demonstrated in the following sbatch script:
#!/bin/bash
#SBATCH --account=ACCOUNT
#SBATCH --job-name=NAME
#SBATCH --partition=PARTITION
#SBATCH --time=TIME
#SBATCH --cpus-per-task=NCPUS
#SBATCH --mem-per-cpu=SIZE
EXECUTABLE COMMANDS
...
Important
If an option is not set, Slurm applies the default value for that option. You should always make sure that either you set the options correctly or that the default is suitable for your job, otherwise your job may not run as expected (e.g. may prioritized low, runs out of memory, will not be able to scale up properly or even may be pending forever).
Description of the options above (and their defaults):
--account (-A)
: ACCOUNT is the name of the project account to be debited (accessible accounts can be listed using the sbalance
command).
Default: The account (project) associated with the owner of the job
(on Komondor, each user is associated with exactly one account for each project in which the user is participating).
--job-name (-J)
: NAME is the short name of the job. Default: The name of the batch script or the sender application.
--partition (-p)
: PARTITION is the partition that is requested for the resource allocation. Default: cpu
--time (-t)
: TIME is the maximum running time (walltime) allowed for the job. The following time formats can be used:
“MINUTES”, “MINUTES:SECONDS”, “HOURS:MINUTES:SECONDS”, “DAYS-HOURS”, “DAYS-HOURS:MINUTES” and “DAYS-HOURS:MINUTES:SECONDS”.
Currently the default is 2 days and the maximum is 7 days on all partitions of Komondor.
--cpus-per-task (-c)
: NCPUS is the number of processors that is to be allocated per task within the job. The default is one CPU core per task.
--mem-per-cpu
: SIZE is the minimium memory required per usable allocated CPU.
Default units are megabytes.
On the Komondor, the default is 1000 MB / CPU core.
The maximum memory that can be allocated per CPU core varies per partition:
“CPU” - 2000 MB, “GPU” - 4000 MB, “AI” - 4000 MB, “BigData” - 42000 MB.
(if more memory allocation is specified, a job will be billed for the numer of CPUs which can provide the required amount of memory)
For a full list of sbatch options, see the description of sbatch in the Slurm documentation
or type man sbatch
in the terminal.
Time Limit
Running jobs have a time limit for which they are allowed to run. When the time limit expires, the job is canceled by the scheduler.
The default time limit on Komondor is 2 days. You can explicitly set the time limit for your job with the --time
option. The maximum
time limit that can be requested is 7 days (if you request more, your job will be left in a PENDING state, possibly indefinitely).
#SBATCH --time=TIME
TIME is the maximum running time (walltime) allowed for the job. The following time formats can be used: “MINUTES”, “MINUTES:SECONDS”, “HOURS:MINUTES:SECONDS”, “DAYS-HOURS”, “DAYS-HOURS:MINUTES” and “DAYS-HOURS:MINUTES:SECONDS”.
The current time limits set for the partitions can be queried with the following command:
sinfo --Format partition,defaulttime,time
Note
Shorter jobs can get higher priority, so it’s a good idea to set the job’s time limit as accurate as you can estimate. Properly setting the time limit also helps Slurm to schedule jobs more efficiently.
CPU Allocation
By default, one CPU core is allocated per task within your job. You can ask for more processors with the --cpus-per-task
option:
#SBATCH --cpus-per-task=NCPUS
NCPUS is the number of processors that is to be allocated per task within the job.
Memory Allocation
By default, 1000 MB of memory is assigned to 1 allocated CPU core. You can request more with the --mem-per-cpu
option:
#SBATCH --mem-per-cpu=SIZE
SIZE is given in MB (megabytes) by default. You can specify the unit explicitly: K for KB (kilobytes), M for MB (megabytes), G for GB (gigabytes), T for TB (terabytes).
The maximum memory that can be allocated per CPU core varies per partition:
“CPU”: 2000 MB / core
“GPU”: 4000 MB / core
“AI”: 4000 MB / core
“BigData”: 42000 MB / core
You can use the --mem-per-gpu
option to specify the amount of memory required per allocated GPU:
#SBATCH --mem-per-gpu=SIZE
The --mem
option sets the required memory per allocated node:
#SBATCH --mem=SIZE
Here, setting the SIZE to 0 means all of the memory on the allocated nodes.
Note: The --mem-per-cpu
, --mem-per-gpu
and --mem
options are mutually exclusive.
GPU Allocation
GPUs can be allocated using the --gres
option:
#SBATCH --gres=gpu:N
N sets the number of required GPUs per node, which can be 1-4 (“GPU” partition) or 1-8 on the “AI” partition (nodes on “CPU” and “BigData” partitions don’t have GPUs).
Node Allocation
Slurm will allocate enough nodes to satisfy the requested resources. However, you can explicitly specify the number of nodes you want to assign
to your job with the --nodes
option:
#SBATCH --nodes=N
N sets the number of required nodes for the job. You can also set this option as e.g. “MINIMUM-MAXIMUM” or a comma-separated list of node numbers.
Setting the number of required nodes does not mean you will get all resources of the allocated nodes. It is just means that the tasks of your job can
use (can be distributed over) that many nodes. If you want to allocate all the CPUS and GRES (generic resources, e.g. GPUs) on the nodes for your job,
you can use the --exclusive
option. Note, that this does not mean that you also get all the memory on the allocated nodes.
#SBATCH --nodes=N
#SBATCH --exclusive
#SBATCH --mem=SIZE_PER_NODE
Non-restartable Jobs
In case of jobs that are not restartable or should not restart, the --no-requeue
option can be set to prevent requeing
(e.g after node failure):
#SBATCH --no-requeue
This setting needs to be applied since default behaviour of Komondor is to requeue the jobs.
Quality of Service
Each job submitted to Slurm is assigned a Quality of Service (QoS), that affects the execution (e.g. priority, preemption, interruption, resource billing) of the job. The default QOS is “normal”: the job cannot be interrupted, and as much CPU time is billed as was used.
For more detailed description and information about the available QOSs, see the “Basic Usage / Quality of Service (QOS)” chapter.
You can set a QOS other than the default for your job using the --qos
option, for example, here is how you set the QOS to “lowpri”:
#SBATCH --qos=lowpri
Low priority jobs may be interrupted then resumed at any time. It is advised to test the behaviour of the used software by artifically terminating it (more explantion will follow…).
Email Notification
You can instruct Slurm to send an email when the state of your job changes (e.g. starts, ends or gives an error):
#SBATCH --mail-type=ALL
#SBATCH --mail-user=EMAIL
You can set the triggering events using the --mail-type
option (you can find the full list in the
sbatch description in the Slurm documentation or type man sbatch
on the terminal).
The --mail-user
option sets the email address where the notification should be sent.
Slurm Environment Variables
Many of the configurations specified (explicitly or implicitly) in the sbatch script are accessible with environmental variables after submission. Some examples:
|
Unique ID for the job. |
|
Name of your job (set with |
|
Number of CPU cores per task (set with |
|
Amount of memory per CPU core (set with |
|
Amount of memory per node (set with |
|
Number of tasks (set with |
|
Number of tasks per node (set with |
You can find a full list of Slurm output variables in the Slurm documentation