Strange issue... a user is running an array job, and directing error output via: #SBATCH --error=./jobout/wrench_%x_%A-%a.err In an output file named: "wrench_w352_41004292-999.err" He is seeing a message: "JOB 41005626 ON compute-7-39 CANCELLED" The question is why is "JOB 41005626" being reported in a file that corresponds to job "41004292" (per the name of the error log)?
(In reply to Jeff Haferman from comment #0) > Strange issue... a user is running an array job, and directing error output > via: > > #SBATCH --error=./jobout/wrench_%x_%A-%a.err > > In an output file named: "wrench_w352_41004292-999.err" > He is seeing a message: "JOB 41005626 ON compute-7-39 CANCELLED" > > The question is why is "JOB 41005626" being reported in a file that > corresponds to job "41004292" (per the name of the error log)? An array job has a few more details regarding job ID's. When submitting an array job, a single job record with a job iD is created but it has special information such as the number of jobs in the array and the job ID of the array. Whenever a job in the array is scheduled, a new job record is created for that job. That new job record has a new job ID, although it still keeps track of the array job ID and its index in the array. In this case: * The array job ID is 41004202. * The job ID is 41005626, not 41004202. * The job's index in the array is 999. All of these values can be accessed via different environment variables, which can be found in the sbatch man page (https://slurm.schedmd.com/sbatch.html). SLURM_ARRAY_TASK_COUNT Total number of tasks in a job array. SLURM_ARRAY_TASK_ID Job array ID (index) number. SLURM_ARRAY_TASK_MAX Job array's maximum ID (index) number. SLURM_ARRAY_TASK_MIN Job array's minimum ID (index) number. SLURM_ARRAY_TASK_STEP Job array's index step size. SLURM_ARRAY_JOB_ID Job array's master job ID number. The meaning of the different "%" options in a filename can be found under the "filename pattern" section of the sbatch man page. %A Job array's master job allocation number. %a Job array ID (index) number. %x Job name. Does that make sense?
Marshall - Thank you, I figured it the explanation would be something like this, but I couldn't quite find it in the documentation. Appreciate it!
You're welcome. Closing as infogiven.