Ticket 3778 - --contiguous parameter not respected when --ntasks also requested
Summary: --contiguous parameter not respected when --ntasks also requested
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 17.02.2
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-05-05 08:11 MDT by Alejandro Sanchez
Modified: 2017-11-17 04:51 MST (History)
0 users

See Also:
Site: SchedMD
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Alejandro Sanchez 2017-05-05 08:11:12 MDT
Reproducer config:

SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE_MEMORY
NodeName=compute[1-3] CPUs=4 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7837 NodeHostname=ibiza State=UNKNOWN Port=61711-61713
PartitionName=debug Nodes=ALL Default=YES State=UP

Reproducer steps:

# compute[1-3] IDLE initially
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      3   idle compute[1-3]

# allocate 1 CPU from the node in the middle (compute2)
$ sbatch -w compute2 -n1 --wrap "sleep 9999"
Submitted batch job 20088
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1    mix compute2
debug*       up   infinite      2   idle compute[1,3]

# Request --ntasks=8
$ sbatch -n8 --contiguous --wrap "sleep 9999"
Submitted batch job 20089
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1    mix compute3
debug*       up   infinite      2  alloc compute[1-2]
$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             20089     debug     wrap     alex  R       0:09      3 compute[1-3]
             20088     debug     wrap     alex  R       0:20      1 compute2

# Allocated nodes (compute[1-3]) doesn't respect a contiguous set within the partition. Coming from bug 3690.
Comment 1 Alejandro Sanchez 2017-11-17 04:51:43 MST
Taking another stab at this I realized the problem was invalid. --contiguous means allocated _nodes_ must conform a contiguous set. When I opened the bug for whatever reason I thought minimum allocatable resources (i.e. cores and/or threads) should conform a contiguous set, but it's not the case. The contiguous nodes are indeed respected after doing some tests. So closing the bug.