[akmalm@hud10 ~]$ rjs test.job queue=teamtim gpu=1 mem=150G sbatch: error: Batch job submission failed: Requested node configuration is not available * slurmctld restarted* [akmalm@hud10 ~]$ rjs test.job queue=teamtim gpu=1 mem=150G 830050 after a while, (or maybe after scontrol reconfigure?) it went back to normal is it partly caused by FastSchedule=0?
using srun, same result [akmalm@hud10 ~]$ srun -pteamtim --mem=2000000000 --constraint=gpu hostname srun: error: Unable to allocate resources: Requested node configuration is not available *slurmctld restarted* [akmalm@hud10 ~]$ srun -pteamtim --mem=2000000000 --constraint=gpu hostname srun: job 830063 queued and waiting for resources
I cannot reproduce this. Is it possible that some hosts became available after the restart with the requested configuration? David
> I cannot reproduce this. Is it possible that some hosts became available > after the restart with the requested configuration? I dont think so. I'm able to reproduce this in a different cluster and on my test multiple slurmd setup Have you tried with a large number of nodes and FastSchedule=0?
Created attachment 2452 [details] scontrol show config output scontrol show config output
(In reply to Akmal Madzlan from comment #0) > [akmalm@hud10 ~]$ rjs test.job queue=teamtim gpu=1 mem=150G > sbatch: error: Batch job submission failed: Requested node configuration is > not available > > * slurmctld restarted* > > [akmalm@hud10 ~]$ rjs test.job queue=teamtim gpu=1 mem=150G > 830050 > > after a while, (or maybe after scontrol reconfigure?) > it went back to normal > > is it partly caused by FastSchedule=0? What is your memory size (e.g.. "RealMemory=x") for the nodes defined in slurm.conf? If not specified, I believe it defaults to 1 MB and until the compute nodes register with their actual size, anything requesting more memory would get the error indicating that no nodes exist with the specified size. The thought behind that is better to reject a job at submit time that will never run then let it sit in the queue indefinitely. You should set FastSchedule=1 and define a minimum memory size for the nodes in slurm.conf such that if the node has less it is configured down and if more then the size will be reset higher.
> What is your memory size (e.g.. "RealMemory=x") for the nodes defined in slurm.conf? we specify RealMemory for each node with the real/close to real value > until the compute nodes register with their actual size, anything requesting more memory would get the error indicating that no nodes exist with the specified size. I dont think this is the behaviour with FastSchedule=0 > You should set FastSchedule=1 and define a minimum memory size for the nodes in slurm.conf We kinda prefer FastSchedule=0 because we dont want a node to be drained if we lost some DIMM and at the same time we dont want the node to be overallocated
(In reply to Akmal Madzlan from comment #6) > > until the compute nodes register with their actual size, anything requesting more memory would get the error indicating that no nodes exist with the specified size. > > I dont think this is the behaviour with FastSchedule=0 That is exactly how it works, but if you define a reasonable memory size in slurm.conf then it would work fine. If you specified less than 150G it would fail as you report and as shown below: Slurmctld restart, slurmd not up yet: $ scontrol show node NodeName=jette CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=4 CPULoad=N/A Features=(null) Gres=(null) NodeAddr=jette-desktop NodeHostName=jette-desktop Version=(null) RealMemory=1 AllocMem=0 FreeMem=N/A Sockets=4 Boards=1 State=UNKNOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A BootTime=None SlurmdStartTime=None CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s $ sbatch -N1 --mem=1g tmp sbatch: error: Batch job submission failed: Requested node configuration is not available =============================================== Slurmd started/responds: $ sbatch -N1 --mem=1g tmp Submitted batch job 96464 $ scontrol show node NodeName=jette Arch=i686 CoresPerSocket=1 CPUAlloc=1 CPUErr=0 CPUTot=4 CPULoad=0.77 Features=(null) Gres=(null) NodeAddr=jette-desktop NodeHostName=jette-desktop Version=15.08 OS=Linux RealMemory=3774 AllocMem=1024 FreeMem=1312 Sockets=4 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=466925 Weight=1 Owner=N/A BootTime=2015-11-27T09:07:44 SlurmdStartTime=2015-11-27T09:49:45 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s NodeAddr=jette-desktop NodeHostName=jette-desktop Version=15.08 OS=Linux RealMemory=3774 AllocMem=1024 FreeMem=1312 Sockets=4 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=466925 Weight=1 Owner=N/A BootTime=2015-11-27T09:07:44 SlurmdStartTime=2015-11-27T09:49:45 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Submitted batch job 96464 $ scontrol show node NodeName=jette Arch=i686 CoresPerSocket=1 CPUAlloc=1 CPUErr=0 CPUTot=4 CPULoad=0.77 Features=(null) Gres=(null)jette@jette-desktop:~/Desktop/SLURM/install.linux/bin$ scontrol show node NodeName=jette CoresPerSocket=1 CPUAlloc=0 CPUErr=0 CPUTot=4 CPULoad=N/A Features=(null) Gres=(null) NodeAddr=jette-desktop NodeHostName=jette-desktop Version=(null) RealMemory=1 AllocMem=0 FreeMem=N/A Sockets=4 Boards=1 State=UNKNOWN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A BootTime=None SlurmdStartTime=None CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s jette@jette-desktop:~/Desktop/SLURM/install.linux/bin
I forgot to say that in my example, there was no RealMemory value for the node in slurm.conf, so it defaulted to 1MB. Whatever is in slurm.conf will be taken as the size until the slurmd on a compute node tells the slurmctld on the head node a different value.
--------------------------------------- [root@kque0001 ~]# service slurm restart && scontrol show node kud13 && scontrol show partition kud13 && sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" stopping slurmctld: [ OK ] slurmctld is stopped slurmctld is stopped starting slurmctld: [ OK ] NodeName=kud13 CoresPerSocket=2 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=N/A Features=localdisk,nogpu,intel Gres=(null) NodeAddr=kud13 NodeHostName=kud13 Version=(null) RealMemory=1 AllocMem=0 Sockets=2 Boards=1 State=UNKNOWN ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s PartitionName=kud13 AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO DefaultTime=01:00:00 DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=kud13 Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED Submitted batch job 3452375 ----------------------------------------- Job is succesfully submitted -------------------------------- [root@kque0001 ~]# scontrol show jobs 3452375 JobId=3452375 JobName=wrap UserId=root(0) GroupId=root(0) Priority=100 Nice=0 Account=root QOS=normal JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2015-11-28T08:33:08 EligibleTime=2015-11-28T08:33:08 StartTime=Unknown EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=kud13 AllocNode:Sid=kque0001:7143 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1-1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=97.66T MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/root StdErr=/root/slurm-3452375.out StdIn=/dev/null StdOut=/root/slurm-3452375.out ------------------------------ A few seconds after scontrol reconfigure, it seems to behave well again ------------------------------- [root@kque0001 ~]# scontrol reconfigure [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname srun: job 3452377 queued and waiting for resources ^Csrun: Job allocation 3452377 has been revoked srun: Force Terminated job 3452377 [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname srun: job 3452378 queued and waiting for resources ^Csrun: Job allocation 3452378 has been revoked [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname srun: error: Unable to allocate resources: Requested node configuration is not available [root@kque0001 ~]# sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" sbatch: error: Batch job submission failed: Requested node configuration is not available ----------------------------- Can you explain this behaviour? Maybe I missed some configuration somewhere
I'll look into this further next week, but I assume what you're seeing is that Slurm doesn't know the max memory on the nodes until they've checked in after the restart - this is an asynchronous process so that slurmctld startup is not delayed indefinitely. Until kud13 reports in with an accurate memory count, Slurm is assuming that it may have sufficient memory to satisfy that job's request. Once slurmd on th enode has reported in to the slurmctld, it now knows such a job will never run given the current hardware, rejects the job and then prevents a new request. As Moe has previously mentioned, setting a RealMemory value in slurm.conf for your nodes should prevent this. I will look into this further next week. - Tim On 11/27/2015 07:39 PM, bugs@schedmd.com wrote: > *Comment # 9 <http://bugs.schedmd.com/show_bug.cgi?id=2193#c9> on bug > 2193 <http://bugs.schedmd.com/show_bug.cgi?id=2193> from Akmal Madzlan > <mailto:akmalm@dugeo.com> * > > --------------------------------------- > [root@kque0001 ~]# service slurm restart && scontrol show node kud13 && > scontrol show partition kud13 && sbatch -N1 --mem=99999g --partition=kud13 > --wrap="hostname" > stopping slurmctld: [ OK ] > slurmctld is stopped > slurmctld is stopped > starting slurmctld: [ OK ] > NodeName=kud13 CoresPerSocket=2 > CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=N/A Features=localdisk,nogpu,intel > Gres=(null) > NodeAddr=kud13 NodeHostName=kud13 Version=(null) > RealMemory=1 AllocMem=0 Sockets=2 Boards=1 > State=UNKNOWN ThreadsPerCore=2 TmpDisk=0 Weight=1 > BootTime=None SlurmdStartTime=None > CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 > ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s > > > PartitionName=kud13 > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=NO > DefaultTime=01:00:00 DisableRootJobs=NO GraceTime=0 Hidden=NO > MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=kud13 > Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF > State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=N/A > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > Submitted batch job 3452375 > ----------------------------------------- > > Job is succesfully submitted > > -------------------------------- > [root@kque0001 ~]# scontrol show jobs 3452375 > JobId=3452375 JobName=wrap > UserId=root(0) GroupId=root(0) > Priority=100 Nice=0 Account=root QOS=normal > JobState=PENDING Reason=Resources Dependency=(null) > Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A > SubmitTime=2015-11-28T08:33:08 EligibleTime=2015-11-28T08:33:08 > StartTime=Unknown EndTime=Unknown > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > Partition=kud13 AllocNode:Sid=kque0001:7143 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=(null) > NumNodes=1-1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > MinCPUsNode=1 MinMemoryNode=97.66T MinTmpDiskNode=0 > Features=(null) Gres=(null) Reservation=(null) > Shared=OK Contiguous=0 Licenses=(null) Network=(null) > Command=(null) > WorkDir=/root > StdErr=/root/slurm-3452375.out > StdIn=/dev/null > StdOut=/root/slurm-3452375.out > ------------------------------ > > A few seconds after scontrol reconfigure, it seems to behave well again > > ------------------------------- > [root@kque0001 ~]# scontrol reconfigure > [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname > srun: job 3452377 queued and waiting for resources > ^Csrun: Job allocation 3452377 has been revoked > srun: Force Terminated job 3452377 > [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname > srun: job 3452378 queued and waiting for resources > ^Csrun: Job allocation 3452378 has been revoked > [root@kque0001 ~]# srun -pkud13 --mem=9999999999 hostname > srun: error: Unable to allocate resources: Requested node configuration is not > available > [root@kque0001 ~]# sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" > sbatch: error: Batch job submission failed: Requested node configuration is not > available > ----------------------------- > > Can you explain this behaviour? > Maybe I missed some configuration somewhere > > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. >
This behavior is also reproducible on 15.08. I've also seen that if the slurmd's are initially down, then restart controller, submit job with --mem=9999g, job is submitted and after a while you can't submit more. If you execute squeue reason is (Resources) but if you then start the slurmd's, reason changes to (BadConstraint).
Akmal, With FastSchedule=0, Slurm doesn't have sufficient information to determine the memory size until the node (slurmd) registers and tells what are the real memory constraints. So Slurm allows jobs being submitted until it knows for sure that they'll never run, and from that point onwards it starts rejecting them. So we suggest to configure some reasonable memory size on the nodes if you use FastSchedule=0. Please, let us know if this makes sense to you and I'll close the ticket. Oherwise please tell us your concerns and we'll try to resolve/explain them with more detail. Alex
> So we suggest to configure some reasonable memory size on the nodes if you use FastSchedule=0. But the thing is, setting RealMemory to a reasonable amount doesnt prevent the jobs from being submitted [root@klugy ~]# service slurm restart && scontrol show node kud13 && scontrol show partition kud13 && sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" # Slurm restarted stopping slurmctld: [ OK ] starting slurmctld: [ OK ] # RealMemory is set to 15947 NodeName=kud13 CoresPerSocket=2 CPUAlloc=0 CPUErr=0 CPUTot=8 CPULoad=N/A Features=localdisk,nogpu,intel Gres=(null) NodeAddr=kud13 NodeHostName=kud13 Version=(null) RealMemory=15947 AllocMem=0 Sockets=2 Boards=1 State=UNKNOWN ThreadsPerCore=2 TmpDisk=0 Weight=1 BootTime=None SlurmdStartTime=None CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s # This partition only contain this node PartitionName=kud13 AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO DefaultTime=01:00:00 DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=kud13 Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=N/A DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED # Job submitted successfully Submitted batch job 41207 # After a while, job is blocked [root@klugy ~]# sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" sbatch: error: Batch job submission failed: Requested node configuration is not available [root@klugy ~]# sbatch -N1 --mem=99999g --partition=kud13 --wrap="hostname" sbatch: error: Batch job submission failed: Requested node configuration is not available
Akmal, because the RealMemory option tells Slurm the _minimum_ memory size on the node, but Slurm is not designed to reject jobs due to bad constraints unless it knows for sure (Slurmd information) that they really have bad constraints. And that's why until the controller talks with the node that it allows job's submission. Does it make sense?
Akmal, another option if you want to reject jobs requesting too much memory before slurm knows how much memory is on each node, is to use a job_submit plugin. It would look at each job's memory request and reject those oever some limit with an appropiate error code. Don't know if you already implemented any job_submit plugin before. Here's some information http://slurm.schedmd.com/job_submit_plugins.html Also if you need any help, please let us know. I think we can close this ticket if there aren't more questions.
Closing the ticket. Please, reopen if any more issues are found.
Thanks Alejandro :D