Advanced Job Definitions
Job Array
Job arrays can be used to run multiple instances of applications (usually with different datasets) simultaneously.
For each instance (task), the scheduler stores a unique ID in the SLURM_ARRAY_TASK_ID environment variable. The tasks of the job array can be
distinguished by querying this variable. Outputs of the tasks are written separately to the “slurm-SLURM_ARRAY_TASK_ID.out” files.
The range of the unique IDs can be specified with the --array
option.
For example, we have a script that we want to run with six different input values:
$ cat dummy.sh
#!/bin/bash
echo "Processing input value: $1"
With the –array option, you can specify the range of (non-negative integer) task identifier values you want to use. The example sbatch script below starts six concurrent tasks with task IDs between 0-5 and uses those IDs as index to set the input value of the dummy.sh script:
#!/bin/bash
#SBATCH --job-name=array_example
#SBATCH --account=my_project
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2000
#SBATCH --array=0-5
INPUT_DATA=(0.1 0.2 0.3 0.4 0.5 0.6)
srun dummy.sh ${INPUT_DATA[$SLURM_ARRAY_TASK_ID]}
The number of concurrently running tasks within the job array can be limited. The following option instruct Slurm to start a total of six tasks, but only run 2 tasks at a time:
#SBATCH --array=0-5%2
You can find more information on the “Job array” page of the Slurm documentation
Packed Jobs
(srun –exclusive)
…
Multiple Tasks
(srun –multi-prog)
…