| Summary: | srun shares resources without an explicit --share flag | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Yossi Cohen <yossic00> |
| Component: | Scheduling | Assignee: | David Bigagli <david> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | CC: | csc-slurm-tickets, da |
| Version: | 14.03.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 14.03.4 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
We noticed the same issue today. Config was fine with SLURM 2.6, but after upgrade to 14.03.03 partitions with Shared=Yes behavior has changed.
SelectType=select/cons_res
SelectTypeParameters=CR_CORE_MEMORY
[root@service01 log]# squeue -w c310
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2177434 longrun xxx user2 R 1-04:02:14 1 c310
2144988 serial xxxxxxxx user1 R 1-04:02:18 1 c310
2145071 serial xxxxxxxx user1 R 1-04:02:18 1 c310
[root@service01 log]# sj 2177434
JobId=2177434 Name=xxx
UserId=user2(17441) GroupId=user2(4747)
Priority=1049 Nice=0 Account=csc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=1-04:02:19 TimeLimit=7-00:00:00 TimeMin=N/A
SubmitTime=2014-06-08T20:58:30 EligibleTime=2014-06-08T20:58:30
StartTime=2014-06-09T10:50:42 EndTime=2014-06-16T10:50:42
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=longrun AllocNode:Sid=login4:38591
ReqNodeList=(null) ExcNodeList=(null)
NodeList=c310
BatchHost=c310
NumNodes=1 NumCPUs=15 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
MinCPUsNode=1 MinMemoryCPU=8000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=1 Contiguous=0 Licenses=(null) Network=(null)
[root@service01 log]# sj 2144988
JobId=2144988 Name=xxxxx
UserId=user1(29012) GroupId=xxx(1187)
Priority=1234 Nice=0 Account=csc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=1-04:02:29 TimeLimit=3-00:00:00 TimeMin=N/A
SubmitTime=2014-06-06T15:39:37 EligibleTime=2014-06-06T15:39:37
StartTime=2014-06-09T10:50:38 EndTime=2014-06-12T10:50:38
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=serial AllocNode:Sid=login4:26363
ReqNodeList=(null) ExcNodeList=(null)
NodeList=c310
BatchHost=c310
NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
MinCPUsNode=1 MinMemoryNode=65440M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=1 Contiguous=0 Licenses=(null) Network=(null)
[root@service01 log]# sj 2145071
JobId=2145071 Name=xxxx
UserId=user1(29012) GroupId=xxxx(1187)
Priority=1234 Nice=0 Account=csc QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=1-04:02:34 TimeLimit=3-00:00:00 TimeMin=N/A
SubmitTime=2014-06-06T15:49:24 EligibleTime=2014-06-06T15:49:24
StartTime=2014-06-09T10:50:38 EndTime=2014-06-12T10:50:38
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=serial AllocNode:Sid=login4:26363
ReqNodeList=(null) ExcNodeList=(null)
NodeList=c310
BatchHost=c310
NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
MinCPUsNode=1 MinMemoryNode=65440M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=1 Contiguous=0 Licenses=(null) Network=(null)
[root@service01 log]# scontrol show node c310
NodeName=c310 Arch=x86_64 CoresPerSocket=8
CPUAlloc=31 CPUErr=0 CPUTot=16 CPULoad=46.97 Features=bigmem
Gres=(null)
NodeAddr=c310 NodeHostName=c310 Version=14.03
OS=Linux RealMemory=258000 AllocMem=250880 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=1800000 Weight=50
BootTime=2014-06-09T09:56:43 SlurmdStartTime=2014-06-09T10:01:01
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Fixed in version 14.03.4 when released and in commit shown below. https://github.com/SchedMD/slurm/commit/c773b7503f1fd23332dca035fe398d88aadeea41 |
I have a partition with 15 nodes. I've tried to set Shared=YES in slurm.conf but I've got some unexpected behavior. The cluster is configured with: SelectType=select/cons_res SelectTypeParameters=CR_CORE,CR_ONE_TASK_PER_CORE NodeName=DEFAULT Boards=1 SocketsPerBoard=1 CoresPerSocket=1 ThreadsPerCore=1 TmpDisk=1921 State=UNKNOWN NodeName=vnc[1-15] RealMemory=12016 PartitionName=DEFAULT State=UP Shared=No PartitionName=pvnc Nodes=vnc[1-15] Default=YES With Shared=No if I open two terminals an try to run jobs that use all nodes I get (vnc5 was down during the experiment): On terminal 1: $ srun -l -N 14 -n 14 bash -c 'hostname; sleep 30' 00: vnc1 02: vnc3 08: vnc10 13: vnc15 12: vnc14 01: vnc2 10: vnc12 09: vnc11 06: vnc8 07: vnc9 03: vnc4 04: vnc6 11: vnc13 05: vnc7 On terminal 2: $ srun -l -N 14 -n 14 bash -c 'hostname; sleep 30' srun: job 328 queued and waiting for resources srun: job 328 has been allocated resources 00: vnc1 05: vnc7 08: vnc10 10: vnc12 09: vnc11 01: vnc2 03: vnc4 06: vnc8 12: vnc14 13: vnc15 07: vnc9 11: vnc13 02: vnc3 04: vnc6 This is what I expected as all the resources are tied up and the job on terminal 2 must wait for the job on terminal 1 to complete. I've changed slurm.conf to Shared=Yes an rerun the experiment: On terminal 1: $ srun -l -N 14 -n 14 bash -c 'hostname; sleep 30' 02: vnc3 13: vnc15 08: vnc10 01: vnc2 09: vnc11 10: vnc12 07: vnc9 04: vnc6 12: vnc14 05: vnc7 00: vnc1 03: vnc4 06: vnc8 11: vnc13 On terminal 2: $ srun -l -N 14 -n 14 bash -c 'hostname; sleep 30' 10: vnc12 06: vnc8 03: vnc4 00: vnc1 12: vnc14 11: vnc13 08: vnc10 04: vnc6 02: vnc3 05: vnc7 01: vnc2 09: vnc11 13: vnc15 07: vnc9 This is unexpected, the job in terminal 2 was supposed to wait since the jobs were not run with the --share flag and according to the slurm.conf man page: YES Makes all resources in the partition available for sharing upon request by the job. Resources will only be over-subscribed when explicitly requested by the user using the "--share" option on job submission... I also seen this behavior when using sbatch. With Shared=No: $ sbatch --array=0-29 ab Submitted batch job 259 $ squeue JOBID PARTITION NAME ST TIME NODES NODELIST COMMENT 259_[14 pvnc ab PD 0:00 1 (Resourc (null) 259_0 pvnc ab R 0:04 1 vnc1 (null) 259_1 pvnc ab R 0:04 1 vnc2 (null) 259_2 pvnc ab R 0:04 1 vnc3 (null) 259_3 pvnc ab R 0:04 1 vnc4 (null) 259_4 pvnc ab R 0:04 1 vnc6 (null) 259_5 pvnc ab R 0:04 1 vnc7 (null) 259_6 pvnc ab R 0:04 1 vnc8 (null) 259_7 pvnc ab R 0:04 1 vnc9 (null) 259_8 pvnc ab R 0:04 1 vnc10 (null) 259_9 pvnc ab R 0:04 1 vnc11 (null) 259_10 pvnc ab R 0:04 1 vnc12 (null) 259_11 pvnc ab R 0:04 1 vnc13 (null) 259_12 pvnc ab R 0:04 1 vnc14 (null) 259_13 pvnc ab R 0:04 1 vnc15 (null) With Shared=Yes (it should behave the same as Shared=No because there is no --share flag): $ sbatch --array=0-29 ab Submitted batch job 290 $ squeue JOBID PARTITION NAME ST TIME NODES NODELIST COMMENT 290_0 pvnc ab R 0:02 1 vnc1 (null) 290_1 pvnc ab R 0:02 1 vnc2 (null) 290_2 pvnc ab R 0:02 1 vnc3 (null) 290_3 pvnc ab R 0:02 1 vnc4 (null) 290_4 pvnc ab R 0:02 1 vnc6 (null) 290_5 pvnc ab R 0:02 1 vnc7 (null) 290_6 pvnc ab R 0:02 1 vnc8 (null) 290_7 pvnc ab R 0:02 1 vnc9 (null) 290_8 pvnc ab R 0:02 1 vnc10 (null) 290_9 pvnc ab R 0:02 1 vnc11 (null) 290_10 pvnc ab R 0:02 1 vnc12 (null) 290_11 pvnc ab R 0:02 1 vnc13 (null) 290_12 pvnc ab R 0:02 1 vnc14 (null) 290_13 pvnc ab R 0:02 1 vnc15 (null) 290_14 pvnc ab R 0:02 1 vnc1 (null) 290_15 pvnc ab R 0:02 1 vnc2 (null) 290_16 pvnc ab R 0:02 1 vnc3 (null) 290_17 pvnc ab R 0:02 1 vnc4 (null) 290_18 pvnc ab R 0:02 1 vnc6 (null) 290_19 pvnc ab R 0:02 1 vnc7 (null) 290_20 pvnc ab R 0:02 1 vnc8 (null) 290_21 pvnc ab R 0:02 1 vnc9 (null) 290_22 pvnc ab R 0:02 1 vnc10 (null) 290_23 pvnc ab R 0:02 1 vnc11 (null) 290_24 pvnc ab R 0:02 1 vnc12 (null) 290_25 pvnc ab R 0:02 1 vnc13 (null) 290_26 pvnc ab R 0:02 1 vnc14 (null) 290_27 pvnc ab R 0:02 1 vnc15 (null) 290_28 pvnc ab R 0:02 1 vnc1 (null) 290_29 pvnc ab R 0:02 1 vnc2 (null)