Advanced Job Definitions

Job Array

Job arrays can be used to run multiple instances of applications (usually with different datasets) simultaneously. For each instance (task), the scheduler stores a unique ID in the SLURM_ARRAY_TASK_ID environment variable. The tasks of the job array can be distinguished by querying this variable. Outputs of the tasks are written separately to the “slurm-SLURM_ARRAY_TASK_ID.out” files. The range of the unique IDs can be specified with the --array option.

For example, we have a script that we want to run with six different input values:

$ cat dummy.sh
#!/bin/bash

echo "Processing input value: $1"

With the –array option, you can specify the range of (non-negative integer) task identifier values you want to use. The example sbatch script below starts six concurrent tasks with task IDs between 0-5 and uses those IDs as index to set the input value of the dummy.sh script:

#!/bin/bash

#SBATCH --job-name=array_example
#SBATCH --account=my_project
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2000
#SBATCH --array=0-5

INPUT_DATA=(0.1 0.2 0.3 0.4 0.5 0.6)
srun dummy.sh ${INPUT_DATA[$SLURM_ARRAY_TASK_ID]}

The number of concurrently running tasks within the job array can be limited. The following option instruct Slurm to start a total of six tasks, but only run 2 tasks at a time:

#SBATCH --array=0-5%2

You can find more information on the “Job array” page of the Slurm documentation

Packed Jobs

(srun –exclusive)

…

Multiple Tasks

(srun –multi-prog)

…