Hello, I am trying to set RealMemory on my nodes to a bit lower than what is actually available but can't get slurm to pick up the change. Is there any other option that I need to change or am I missing something here? -- [ghpc1 root@nb001 ~]# grep -i realmem /etc/slurm/slurm.conf NodeName=nc[001-291] CoresPerSocket=28 RealMemory=254850 Sockets=2 ThreadsPerCore=1 [ghpc1 root@nb001 ~]# ssh nc291 free -m | grep -i Mem Mem: 257850 11072 245499 144 1277 245013 [ghpc1 root@nb001 ~]# sinfo -lNe | grep -i 291 nc291 1 defq* idle 56 2:28:1 257850 762723 1 (null) none I have tried to bounce slurmctl and slurmd but can't get the RealMemory setting to stick. The sockets/cores settings seem to update just ok. Here is my slurm.conf: # cat /etc/slurm/slurm.conf # # See the slurm.conf man page for more information. # ClusterName=SLURM_CLUSTER SlurmUser=slurm #SlurmdUser=root SlurmctldPort=6817 SlurmdPort=6818 AuthType=auth/munge #JobCredentialPrivateKey= #JobCredentialPublicCertificate= StateSaveLocation=/cm/shared/apps/slurm/var/cm/statesave SlurmdSpoolDir=/cm/local/apps/slurm/var/spool SwitchType=switch/none MpiDefault=none SlurmctldPidFile=/var/run/slurmctld.pid SlurmdPidFile=/var/run/slurmd.pid #ProctrackType=proctrack/pgid ProctrackType=proctrack/cgroup #PluginDir= CacheGroups=0 #FirstJobId= ReturnToService=2 #MaxJobCount= MaxJobCount=2000000 MaxArraySize=500000 #PlugStackConfig= PlugStackConfig=/etc/slurm/plugstack.conf #PropagatePrioProcess= #PropagateResourceLimits= #PropagateResourceLimitsExcept= #SrunProlog= #SrunEpilog= TaskProlog=/cm/local/apps/slurm/var/prologs/user_prolog.sh #TaskEpilog= TaskPlugin=task/cgroup,task/affinity #TrackWCKey=no TreeWidth=18 #TmpFs= #UsePAM= PrologFlags=contain # # TIMERS SlurmctldTimeout=300 SlurmdTimeout=300 InactiveLimit=0 MinJobAge=300 KillWait=30 Waittime=0 MessageTimeout=30 # # SCHEDULING #SchedulerAuth= #SchedulerPort= #SchedulerRootFilter= #PriorityType=priority/multifactor #PriorityDecayHalfLife=14-0 #PriorityUsageResetPeriod=14-0 #PriorityWeightFairshare=100000 #PriorityWeightAge=1000 #PriorityWeightPartition=10000 #PriorityWeightJobSize=1000 #PriorityMaxAge=1-0 SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory SchedulerParameters=max_rpc_cnt=20,sched_interval=10,bf_interval=30,bf_window=20160,kill_invalid_depend PriorityType=priority/multifactor PriorityDecayHalfLife=14-0 PriorityWeightFairshare=100000 PriorityWeightAge=10000 PriorityWeightPartition=0 PriorityWeightJobSize=1000 PriorityWeightQOS=100000 PriorityMaxAge=7-0 PriorityFlags=FAIR_TREE #Default Memory DefMemPerCPU=4096 # # LOGGING SlurmctldDebug=4 SlurmctldLogFile=/var/log/slurmctld SlurmdDebug=4 SlurmdLogFile=/var/log/slurmd DebugFlags=ElasticSearch #JobCompType=jobcomp/filetxt #JobCompLoc=/cm/local/apps/slurm/var/spool/job_comp.log JobCompType=jobcomp/elasticsearch JobCompLoc=http://elasticsearch.marathon.mesos.ghpc1.sc1.roche.com:9200 #JobCompLoc=http://elasticsearch.marathon.mesos.hpct1.sc1.roche.com:9200 # # PROFILING AcctGatherProfileType=acct_gather_profile/influxdb #AcctGatherInfinibandType=acct_gather_infiniband/ofed # # ACCOUNTING JobAcctGatherType=jobacct_gather/cgroup JobAcctGatherFrequency=task=15 AccountingStorageEnforce=qos,limits,associations AccountingStorageType=accounting_storage/slurmdbd AccountingStorageUser=slurm AccountingStorageTRES=gres/gpu # AccountingStorageLoc=slurm_acct_db # AccountingStoragePass=SLURMDBD_USERPASS ## ## Job Submit plugins ### # #JobSubmitPlugins=lua ## ## Reboot nodes ### # RebootProgram="/usr/bin/logger -p user.crit 'Slurm rebooting Node!!' && /bin/echo 1 > /proc/sys/kernel/sysrq && /bin/echo b > /proc/sysrq-trigger" # This section of this file was automatically generated by cmd. Do not edit manually! # BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE # Scheduler SchedulerType=sched/backfill # Master nodes ControlMachine=nb001 ControlAddr=nb001 BackupController=nb002 BackupAddr=nb002 AccountingStorageHost=nb001 # Nodes NodeName=ni003 NodeName=nc[001-291] CoresPerSocket=28 RealMemory=254850 Sockets=2 ThreadsPerCore=1 NodeName=nh[001-006] CoresPerSocket=44 Sockets=2 ThreadsPerCore=1 # Partitions PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=nc[001-291] PartitionName=himem Default=NO MinNodes=1 AllowGroups=ALL DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0 PreemptMode=OFF ReqResv=NO DefMemPerCPU=23000 AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=nh[001-006] # Generic resources types GresTypes=gpu,mic # Epilog/Prolog parameters PrologSlurmctld=/cm/local/apps/cmd/scripts/prolog-prejob Prolog=/cm/local/apps/cmd/scripts/prolog Epilog=/cm/local/apps/cmd/scripts/epilog # Fast Schedule option FastSchedule=0 # Power Saving SuspendTime=-1 # this disables power saving SuspendTimeout=30 ResumeTimeout=60 SuspendProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweroff ResumeProgram=/cm/local/apps/cluster-tools/wlm/scripts/slurmpoweron # END AUTOGENERATED SECTION -- DO NOT REMOVE
Looks like this was because FastSchedule was set to 0 instead of 1. After changing this setting we are now good. Feel free to close this request. Regards, -Simran
Yep, that would do it. You might also want to look at the MemSpecLimit option as an alternative approach, if you have a reason to use FastSchedule=0. Marking resolved/infogiven. - Tim