Ticket 12295

Summary: job scheduled to wrong nodes with different constraint
Product: Slurm Reporter: Michael Hebenstreit <michael.hebenstreit>
Component: SchedulingAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: dwightman, fabecassis, lyeager
Version: 20.11.7   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=8019
https://bugs.schedmd.com/show_bug.cgi?id=10707
Site: Intel CRT Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 21.08.2
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm.conf
slurmctl.log
sched.log

Description Michael Hebenstreit 2021-08-17 11:50:27 MDT
we just found a new error - a job that was scheduled to the completely wrong nodes. Requested was constraint leaf113, but it was scheduled to leaf213 (bhosts is a script calling sinfo with different format to show node constraints):

[root@eslurm1 crtdc]# bhosts -p idealq -n eii[217-232]
NODELIST      NODES CPUS(A/I/O/T)    PARTITION        STATE      AVAIL_FEATURES
eii[219-232]     14 1008/0/0/1008    idealq           alloc      reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
eii[217-218]      2 0/144/0/144      idealq           idle       reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5

[root@eslurm1 crtdc]# sacct -j 174615 --format Constraints,NodeList
        Constraints        NodeList
------------------- ---------------
 [leaf113&reconfig]    eii[217-232]
                             eii217
                       eii[218-232]
                       eii[218-232]
                       eii[218-232]
[root@eslurm1 crtdc]# grep leaf213 slurm.conf
NodeName=eii[217-234]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
[root@eslurm1 crtdc]# grep leaf113 slurm.conf
NodeName=eia[073-077]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[078-080]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[081-084]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[085-093]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5

[root@eslurm1 crtdc]# grep 174615 /opt/slurm/current/logs/slurm/slurmctl.log
[2021-08-16T23:15:42.523] _slurm_rpc_submit_batch_job: JobId=174615 InitPrio=4294726966 usec=762
[2021-08-16T23:15:45.809] sched: Allocate JobId=174615 NodeList=eii[217-232] #CPUs=1152 Partition=idealq
[2021-08-16T23:24:10.821] JobId=174615 boot complete for all 16 nodes
[2021-08-16T23:24:10.821] prolog_running_decr: Configuration for JobId=174615 is complete
[2021-08-16T23:24:49.968] _job_complete: JobId=174615 WEXITSTATUS 1
[2021-08-16T23:24:49.970] _job_complete: JobId=174615 done
[2021-08-16T23:26:03.364] cleanup_completing: JobId=174615 completion process took 74 seconds

[root@eslurm1 crtdc]# grep 174615 /opt/slurm/current/logs/slurm/sched.log
sched: [2021-08-16T23:15:42.523] JobId=174615 allocated resources: NodeList=(null)
sched: [2021-08-16T23:15:45.809] JobId=174615 initiated
sched: [2021-08-16T23:15:45.809] Allocate JobId=174615 NodeList=eii[217-232] #CPUs=1152 Partition=idealq
Comment 1 Michael Hebenstreit 2021-08-17 11:54:22 MDT
Created attachment 20862 [details]
slurm.conf
Comment 2 Michael Hebenstreit 2021-08-17 11:54:59 MDT
Created attachment 20863 [details]
slurmctl.log
Comment 3 Michael Hebenstreit 2021-08-17 11:59:25 MDT
Created attachment 20864 [details]
sched.log
Comment 5 Dominik Bartkiewicz 2021-08-18 05:16:53 MDT
Hi

Could you send me sbatch script and command line used to submit job 174615?
Did this happen just once, or are you able to reproduce this issue?

Dominik
Comment 6 Michael Hebenstreit 2021-08-18 06:52:48 MDT
so far it was only detected once
Comment 7 Michael Hebenstreit 2021-08-18 09:18:25 MDT
sbatch --exclusive "-C" "[leaf113&reconfig]" "-t" "60" "-N" "16" "-n" "1024" "-p" "idealq" "RUN-amg.slurm"

$ cat RUN-amg.slurm
#!/bin/bash -login
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=64
#SBATCH --threads-per-core=1
#SBATCH -J amg
#SBATCH --time=1:00:00
#SBATCH --exclusive
#SBATCH -d singleton

ulimit -s unlimited
module purge
source /opt/intel/oneAPI/2021.3.0.3219/setvars.sh
.....
Comment 9 Dominik Bartkiewicz 2021-08-20 03:49:32 MDT
Hi

[leaf113&reconfig] has incorrect syntax and slurmctld should block it at submission.
Its behavior is undefined. I will let you know when this fix will be in the repo.

sbatch man:
...
              Multiple Counts
                     Specific  counts  of  multiple  resources  may be specified by using the AND
                     operator and enclosing the options within square brackets.   For  example,
                     --constraint="[rack1*2&rack2*4]"  might  be  used  to specify that two nodes
                     must be allocated from nodes with the feature of "rack1" and four nodes must
                     be allocated from nodes with the feature "rack2".
...

Dominik
Comment 10 Michael Hebenstreit 2021-08-20 06:25:59 MDT
but the point here is that the node must have both constraints - how should that be formulated correctly?
Comment 11 Dominik Bartkiewicz 2021-08-20 10:02:31 MDT
Without square brackets.
From sbatch man:
...
              AND    If only nodes with all of specified features will be used.  The ampersand is
                     used for an AND operator.  For example, --constraint="intel&gpu"
...
Comment 12 Michael Hebenstreit 2021-08-20 10:10:02 MDT
oh, so the brackets should not have been there. User error then

submission should be: sbatch --exclusive "-C" "leaf113&reconfig" "-t" "60" "-N" "16" "-n" "1024" "-p" "idealq" "RUN-amg.slurm"

correct?
Comment 13 Dominik Bartkiewicz 2021-08-20 10:46:11 MDT
Yes exactly.
Comment 14 Luke Yeager 2021-08-25 08:57:48 MDT
FYI there are some more parsing quirks to look out for listed in bug#10707.

I've added my site to the CC list because we have a vested interest in the way this field is parsed - see bug#12286 - and we want to track related changes.
Comment 21 Michael Hinton 2021-10-01 11:34:49 MDT
*** Ticket 8019 has been marked as a duplicate of this ticket. ***
Comment 22 Dominik Bartkiewicz 2021-10-04 02:58:36 MDT
Hi

Those commits protect from using incorrect syntaxes in constrain expression:
https://github.com/SchedMD/slurm/commit/27370b018
https://github.com/SchedMD/slurm/commit/2aa867638
https://github.com/SchedMD/slurm/commit/98bdb3f4e

We also improved documentation:
https://github.com/SchedMD/slurm/commit/1e5345842

The next 21.08 release will include those patches.
Let me know if it is OK to close this ticket now.

Dominik
Comment 23 Michael Hebenstreit 2021-10-04 07:14:45 MDT
the problem was not an incorrect submit, but incorrect scheduling. How do those patches fit the error?
Comment 24 Dominik Bartkiewicz 2021-10-04 07:49:18 MDT
Hi

I thought that we agreed that this was a user error (comment 12). The correct syntax for such requests should not contain square brackets.

Dominik
Comment 25 Michael Hebenstreit 2021-10-04 08:28:12 MDT
if you are sure that's the only problem then close it