Job Arrays¶
A job array lets you submit one job that expands into many identical tasks (array elements).
Each element runs the same script but with a different array index, making it perfect for embarrassingly-parallel workloads such as file-by-file processing.
Example¶
#!/bin/bash
#PBS -J 1-10 # Create a job array with 10 tasks
#PBS -l select=1:ncpus=2:mem=8gb # Request resources
#PBS -l walltime=01:00:00 # Set a maximum wall
module load mysoftware # Load any necessary modules
# Run the application, using the current PBS_ARRAY_INDEX
# to select a specific input file. Each array task runs
# independently and in parallel, processing a different file.
./my_program mydata/$PBS_ARRAY_INDEX/myfile.dat
This example job array script runs 10 parallel tasks, each requesting 2 CPU cores and 8 GB of memory for up to 1 hour, resulting in a total allocation of up to 20 CPU cores and 80 GB of memory across the cluster, with each task processing a different input file based on its array index.
Environment Variables¶
When job launches each array element, it automatically sets a handful of environment variables that your script can query.
| Variable | What it holds |
|---|---|
PBS_ARRAY_INDEX |
The element’s numeric index. |
PBS_JOBID / PBS_JOBID_ARRAY |
Job ID of the parent array; element IDs look like 1234[7]. |
PBS_NODEFILE |
Hostfile listing the node(s) allocated to this element. |
PBS_O_WORKDIR |
Directory where you ran qsub. |
Array Directive¶
The -J option is what turns a single job submission into a job array.
| Directive | Task indices created |
|---|---|
-J 1-10 |
1 2 3 4 5 6 7 8 9 10 |
-J 0-99 |
0 1 2 … 99 |
-J 1-100:5 |
1 6 11 … 96 (step = 5) |
-J 1-5,20-25 |
1 2 3 4 5 20 21 22 23 24 25 |