Ticket 8957 - "matching or" constraints don't work as expected
Summary: "matching or" constraints don't work as expected
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 19.05.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-04-29 01:05 MDT by Ahmed Essam ElMazaty
Modified: 2023-03-13 15:47 MDT (History)
4 users (show)

See Also:
Site: KAUST
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.02.5
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ahmed Essam ElMazaty 2020-04-29 01:05:43 MDT
Hello,
As stated in the documentation
"If only one of a set of possible options should be used for all allocated nodes, then use the OR operator and enclose the options within square brackets"
So the allocated node must have and of the constraints inside the square brackets
However I see a different behaviour when using this
some time I get allocated a node which doesn't have any of the requested constraints.

An example:
[mazatyae@cn509-02-l ~]$ srun -t 1 --pty -n 40 -N 1  --constraint="[cpu_intel_e5_2680_v2|cpu_intel_e5_2670_v2]" bash -l
srun: job 10446359 queued and waiting for resources

# scontrol show job 10446359
JobId=10446359 JobName=bash
   UserId=mazatyae(167627) GroupId=g-mazatyae(1167627) MCS_label=N/A
   Priority=479 Nice=0 Account=default QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2020-04-29T09:54:11 EligibleTime=2020-04-29T09:54:11
   AccrueTime=2020-04-29T09:54:11
   StartTime=2020-04-29T10:44:14 EndTime=2020-04-29T10:45:14 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-04-29T09:54:46
   Partition=batch AllocNode:Sid=cn509-02-l:111903
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null) SchedNodeList=cn605-13-r
   NumNodes=1-1 NumCPUs=40 NumTasks=40 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=40,mem=80G,node=1,billing=40
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=2G MinTmpDiskNode=0
   Features=[cpu_intel_e5_2680_v2|cpu_intel_e5_2670_v2]&nolmem&nogpu DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=bash
   WorkDir=/home/mazatyae
   Power=


However node "cn605-13-r" doesn't have any of the requested constraints
[mazatyae@cn509-02-l ~]$ scontrol show node cn605-13-r
NodeName=cn605-13-r Arch=x86_64 CoresPerSocket=20 
   CPUAlloc=40 CPUTot=40 CPULoad=40.51
   AvailableFeatures=dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
   ActiveFeatures=dragon,cpu_intel_gold_6148,skylake,intel,ibex2018,nogpu,nolmem,local_200G,local_400G,local_500G
   Gres=(null)
   NodeAddr=cn605-13-r NodeHostName=cn605-13-r Version=19.05.5
   OS=Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019 
   RealMemory=375618 AllocMem=368640 FreeMem=52792 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=100 Owner=N/A MCS_label=N/A
   Partitions=batch 
   BootTime=2020-03-21T12:11:10 SlurmdStartTime=2020-03-21T12:29:05
   CfgTRES=cpu=40,mem=375618M,billing=40
   AllocTRES=cpu=40,mem=360G
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


This doesn't happen to me when I use the normal "or" without the square brackets.
Best regards,
Ahmed
Comment 1 Dominik Bartkiewicz 2020-04-29 07:54:52 MDT
Hi

I can reproduce this. I'll let you know when I have a fix.

Dominik
Comment 8 Dominik Bartkiewicz 2020-09-03 07:07:25 MDT
Hi

Finally, we commit fix for this issue:
https://github.com/SchedMD/slurm/commit/bb97ad45
It is in 20.02 branch and will be in included 20.02.5 releases.

Sorry that this took so long.
Closing as resolved/fixed, please reopen if needed.

Dominik