Ticket 3778

Summary: --contiguous parameter not respected when --ntasks also requested
Product: Slurm Reporter: Alejandro Sanchez <alex>
Component: slurmctldAssignee: Alejandro Sanchez <alex>
Status: RESOLVED INVALID QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 17.02.2   
Hardware: Linux   
OS: Linux   
Site: SchedMD Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Alejandro Sanchez 2017-05-05 08:11:12 MDT
Reproducer config:

SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY
NodeName=compute[1-3] CPUs=4 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7837 NodeHostname=ibiza State=UNKNOWN Port=61711-61713
PartitionName=debug Nodes=ALL Default=YES State=UP

Reproducer steps:

# compute[1-3] IDLE initially
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      3   idle compute[1-3]

# allocate 1 CPU from the node in the middle (compute2)
$ sbatch -w compute2 -n1 --wrap "sleep 9999"
Submitted batch job 20088
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1    mix compute2
debug*       up   infinite      2   idle compute[1,3]

# Request --ntasks=8
$ sbatch -n8 --contiguous --wrap "sleep 9999"
Submitted batch job 20089
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1    mix compute3
debug*       up   infinite      2  alloc compute[1-2]
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             20089     debug     wrap     alex  R       0:09      3 compute[1-3]
             20088     debug     wrap     alex  R       0:20      1 compute2

# Allocated nodes (compute[1-3]) doesn't respect a contiguous set within the partition. Coming from bug 3690.
Comment 1 Alejandro Sanchez 2017-11-17 04:51:43 MST
Taking another stab at this I realized the problem was invalid. --contiguous means allocated _nodes_ must conform a contiguous set. When I opened the bug for whatever reason I thought minimum allocatable resources (i.e. cores and/or threads) should conform a contiguous set, but it's not the case. The contiguous nodes are indeed respected after doing some tests. So closing the bug.