Ticket 12295

Summary:	job scheduled to wrong nodes with different constraint
Product:	Slurm	Reporter:	Michael Hebenstreit <michael.hebenstreit>
Component:	Scheduling	Assignee:	Dominik Bartkiewicz <bart>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	dwightman, fabecassis, lyeager
Version:	20.11.7
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=8019 https://bugs.schedmd.com/show_bug.cgi?id=10707
Site:	Intel CRT	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	21.08.2
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf slurmctl.log sched.log

Description Michael Hebenstreit 2021-08-17 11:50:27 MDT

we just found a new error - a job that was scheduled to the completely wrong nodes. Requested was constraint leaf113, but it was scheduled to leaf213 (bhosts is a script calling sinfo with different format to show node constraints):

[root@eslurm1 crtdc]# bhosts -p idealq -n eii[217-232]
NODELIST      NODES CPUS(A/I/O/T)    PARTITION        STATE      AVAIL_FEATURES
eii[219-232]     14 1008/0/0/1008    idealq           alloc      reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
eii[217-218]      2 0/144/0/144      idealq           idle       reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5

[root@eslurm1 crtdc]# sacct -j 174615 --format Constraints,NodeList
        Constraints        NodeList
------------------- ---------------
 [leaf113&reconfig]    eii[217-232]
                             eii217
                       eii[218-232]
                       eii[218-232]
                       eii[218-232]
[root@eslurm1 crtdc]# grep leaf213 slurm.conf
NodeName=eii[217-234]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=reconfig,leaf213,icx8360Y,icx8360Yf3,icx8360Yopa,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
[root@eslurm1 crtdc]# grep leaf113 slurm.conf
NodeName=eia[073-077]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[078-080]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[081-084]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5
NodeName=eia[085-093]  Boards=1 SocketsPerBoard=2 CoresPerSocket=36 State=UNKNOWN Feature=leaf113,icx8360Y,icx8360Yf2,corenode,IBSTACK=mlnx-5.1-2.5.8.0_240.22.1.3_2.12.5

[root@eslurm1 crtdc]# grep 174615 /opt/slurm/current/logs/slurm/slurmctl.log
[2021-08-16T23:15:42.523] _slurm_rpc_submit_batch_job: JobId=174615 InitPrio=4294726966 usec=762
[2021-08-16T23:15:45.809] sched: Allocate JobId=174615 NodeList=eii[217-232] #CPUs=1152 Partition=idealq
[2021-08-16T23:24:10.821] JobId=174615 boot complete for all 16 nodes
[2021-08-16T23:24:10.821] prolog_running_decr: Configuration for JobId=174615 is complete
[2021-08-16T23:24:49.968] _job_complete: JobId=174615 WEXITSTATUS 1
[2021-08-16T23:24:49.970] _job_complete: JobId=174615 done
[2021-08-16T23:26:03.364] cleanup_completing: JobId=174615 completion process took 74 seconds

[root@eslurm1 crtdc]# grep 174615 /opt/slurm/current/logs/slurm/sched.log
sched: [2021-08-16T23:15:42.523] JobId=174615 allocated resources: NodeList=(null)
sched: [2021-08-16T23:15:45.809] JobId=174615 initiated
sched: [2021-08-16T23:15:45.809] Allocate JobId=174615 NodeList=eii[217-232] #CPUs=1152 Partition=idealq

Comment 1 Michael Hebenstreit 2021-08-17 11:54:22 MDT

Created attachment 20862 [details]
slurm.conf

Comment 2 Michael Hebenstreit 2021-08-17 11:54:59 MDT

Created attachment 20863 [details]
slurmctl.log

Comment 3 Michael Hebenstreit 2021-08-17 11:59:25 MDT

Created attachment 20864 [details]
sched.log

Comment 5 Dominik Bartkiewicz 2021-08-18 05:16:53 MDT

Hi

Could you send me sbatch script and command line used to submit job 174615?
Did this happen just once, or are you able to reproduce this issue?

Dominik

Comment 6 Michael Hebenstreit 2021-08-18 06:52:48 MDT

so far it was only detected once

Comment 7 Michael Hebenstreit 2021-08-18 09:18:25 MDT

sbatch --exclusive "-C" "[leaf113&reconfig]" "-t" "60" "-N" "16" "-n" "1024" "-p" "idealq" "RUN-amg.slurm"

$ cat RUN-amg.slurm
#!/bin/bash -login
#SBATCH --nodes=16
#SBATCH --ntasks-per-node=64
#SBATCH --threads-per-core=1
#SBATCH -J amg
#SBATCH --time=1:00:00
#SBATCH --exclusive
#SBATCH -d singleton

ulimit -s unlimited
module purge
source /opt/intel/oneAPI/2021.3.0.3219/setvars.sh
.....

Comment 9 Dominik Bartkiewicz 2021-08-20 03:49:32 MDT

Hi

[leaf113&reconfig] has incorrect syntax and slurmctld should block it at submission.
Its behavior is undefined. I will let you know when this fix will be in the repo.

sbatch man:
...
              Multiple Counts
                     Specific  counts  of  multiple  resources  may be specified by using the AND
                     operator and enclosing the options within square brackets.   For  example,
                     --constraint="[rack1*2&rack2*4]"  might  be  used  to specify that two nodes
                     must be allocated from nodes with the feature of "rack1" and four nodes must
                     be allocated from nodes with the feature "rack2".
...

Dominik

Comment 10 Michael Hebenstreit 2021-08-20 06:25:59 MDT

but the point here is that the node must have both constraints - how should that be formulated correctly?

Comment 11 Dominik Bartkiewicz 2021-08-20 10:02:31 MDT

Without square brackets.
From sbatch man:
...
              AND    If only nodes with all of specified features will be used.  The ampersand is
                     used for an AND operator.  For example, --constraint="intel&gpu"
...

Comment 12 Michael Hebenstreit 2021-08-20 10:10:02 MDT

oh, so the brackets should not have been there. User error then

submission should be: sbatch --exclusive "-C" "leaf113&reconfig" "-t" "60" "-N" "16" "-n" "1024" "-p" "idealq" "RUN-amg.slurm"

correct?

Comment 13 Dominik Bartkiewicz 2021-08-20 10:46:11 MDT

Yes exactly.

Comment 14 Luke Yeager 2021-08-25 08:57:48 MDT

FYI there are some more parsing quirks to look out for listed in bug#10707.

I've added my site to the CC list because we have a vested interest in the way this field is parsed - see bug#12286 - and we want to track related changes.

Comment 21 Michael Hinton 2021-10-01 11:34:49 MDT

*** Ticket 8019 has been marked as a duplicate of this ticket. ***

Comment 22 Dominik Bartkiewicz 2021-10-04 02:58:36 MDT

Hi

Those commits protect from using incorrect syntaxes in constrain expression:
https://github.com/SchedMD/slurm/commit/27370b018
https://github.com/SchedMD/slurm/commit/2aa867638
https://github.com/SchedMD/slurm/commit/98bdb3f4e

We also improved documentation:
https://github.com/SchedMD/slurm/commit/1e5345842

The next 21.08 release will include those patches.
Let me know if it is OK to close this ticket now.

Dominik

Comment 23 Michael Hebenstreit 2021-10-04 07:14:45 MDT

the problem was not an incorrect submit, but incorrect scheduling. How do those patches fit the error?

Comment 24 Dominik Bartkiewicz 2021-10-04 07:49:18 MDT

Hi

I thought that we agreed that this was a user error (comment 12). The correct syntax for such requests should not contain square brackets.

Dominik

Comment 25 Michael Hebenstreit 2021-10-04 08:28:12 MDT

if you are sure that's the only problem then close it