| Summary: | --contiguous parameter not respected when --ntasks also requested | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Alejandro Sanchez <alex> |
| Component: | slurmctld | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 17.02.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SchedMD | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Taking another stab at this I realized the problem was invalid. --contiguous means allocated _nodes_ must conform a contiguous set. When I opened the bug for whatever reason I thought minimum allocatable resources (i.e. cores and/or threads) should conform a contiguous set, but it's not the case. The contiguous nodes are indeed respected after doing some tests. So closing the bug. |
Reproducer config: SelectType = select/cons_res SelectTypeParameters = CR_CORE_MEMORY NodeName=compute[1-3] CPUs=4 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2 RealMemory=7837 NodeHostname=ibiza State=UNKNOWN Port=61711-61713 PartitionName=debug Nodes=ALL Default=YES State=UP Reproducer steps: # compute[1-3] IDLE initially $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 3 idle compute[1-3] # allocate 1 CPU from the node in the middle (compute2) $ sbatch -w compute2 -n1 --wrap "sleep 9999" Submitted batch job 20088 $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 1 mix compute2 debug* up infinite 2 idle compute[1,3] # Request --ntasks=8 $ sbatch -n8 --contiguous --wrap "sleep 9999" Submitted batch job 20089 $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 1 mix compute3 debug* up infinite 2 alloc compute[1-2] $ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 20089 debug wrap alex R 0:09 3 compute[1-3] 20088 debug wrap alex R 0:20 1 compute2 # Allocated nodes (compute[1-3]) doesn't respect a contiguous set within the partition. Coming from bug 3690.