Summary: | grptres limits for nodes/cores | ||
---|---|---|---|
Product: | Slurm | Reporter: | Bill Abbott <babbott> |
Component: | Limits | Assignee: | Unassigned Developer <dev-unassigned> |
Status: | OPEN --- | QA Contact: | |
Severity: | 5 - Enhancement | ||
Priority: | --- | CC: | bart, felip.moll, patrice.peterson |
Version: | 16.05.10 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=2580 | ||
Site: | Rutgers | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | future | DevPrio: | 4 - Medium |
Emory-Cloud Sites: | --- |
Description
Bill Abbott
2017-10-10 13:22:40 MDT
Hi Why you don't want to base this on cpus? GrpTRES=cpus=somenumber Dominik We guarantee owners immediate access to purchased nodes, and limiting by cpu would allow one owner to spread single-core jobs across all nodes. That would potentially block other owners for up to max walltime. Hi I am afraid we are not planing to change this GrpTRES=nodes behavior. If you have homogeneous nodes, usage of GrpTRES=cpus should be good choice. Scenario from comment 2 is generally not effective and shouldn't be used. Dominik Hi Could you give me more info? I need numbers of nodes, how many nodes do you want to give to each group Are your nodes homogeneous? What kind of benefit do you expect from using "floating partitions"? Dominik Sure. The nodes in this case are homogeneous and there are 52 of them. Most owners will have 1-5 nodes in their partition. We want to use floating partitions so we can bring nodes in and out of service without impacting users. The behavior right now is that an owner with 3 nodes (28 cores each) who runs 10 single-core jobs would end up with 3 single-core jobs running, 7 single-core job in the queue and 81 cores sitting idle. What we'd like instead is that those 10 single-core jobs get packed into those three nodes, no jobs end up in the queue, 74 cores still available. (In reply to Bill Abbott from comment #8) > Sure. The nodes in this case are homogeneous and there are 52 of them. > Most owners will have 1-5 nodes in their partition. We want to use floating > partitions so we can bring nodes in and out of service without impacting > users. > > The behavior right now is that an owner with 3 nodes (28 cores each) who > runs 10 single-core jobs would end up with 3 single-core jobs running, 7 > single-core job in the queue and 81 cores sitting idle. > > What we'd like instead is that those 10 single-core jobs get packed into > those three nodes, no jobs end up in the queue, 74 cores still available. Are the core counts the same across the nodes? If they are, limiting the TRES based on cpu count, and leaving the node count off, would be the simplest strategy at present. One other option, which I'm not sure if you've looked at, would be to use a PartitionQOS with MaxNodes=3. That should limit the Partition to using three nodes at a time. You do need to be careful how this interacts with normal QOS's though. Are you able to attach your current slurm.conf, and output from 'scontrol show assoc' ? thanks, - Tim The core counts are the same across this pool of nodes, but I don't see how that addresses the problem. If user 1 has 5 nodes but uses 6, then user 2 gets blocked from their node. That violates our SLA. How can we limit tres by cpu but ensure they all go to the same 5 nodes? The partition option of MaxNodes looked perfect, but that apparently means "Max nodes per job", not "Max nodes per partition". We want an arbitrary number of jobs that can't spread past 5 nodes in any case. I'll paste the slurm.conf as a different comment, but here are the relevant lines from a single user: NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8 PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01 PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas # sacctmgr show qos sas Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES ---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- sas 0 00:00:00 cluster 1.000000 cpu=32 So what we really want is PartitionName=sas Nodes=slepner[001-048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas and grptres=nodes=2 # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=saul ControlAddr=saul #BackupController= #BackupAddr= # AuthType=auth/munge CacheGroups=0 #CheckpointType=checkpoint/none CryptoType=crypto/munge #DisableRootJobs=NO DisableRootJobs=YES #EnforcePartLimits=NO Epilog=/etc/slurm/slurm.epilog.clean #PrologSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 JobCheckpointDir=/var/lib/slurm/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 JobRequeue=0 #JobSubmitPlugins=1 #KillOnBadExit=0 #Licenses=foo*4,bar #MailProg=/usr/bin/mail #MaxJobCount=5000 #MaxStepCount=40000 #MaxTasksPerNode=128 #MpiDefault=none MpiDefault=none #MpiParams=ports=#-# MpiParams=ports=12000-12999 #PluginDir= #PlugStackConfig= #PrivateData=jobs #ProctrackType=proctrack/pgid #ProctrackType=proctrack/linuxproc ProctrackType=proctrack/cgroup #Prolog= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= PropagateResourceLimitsExcept=MEMLOCK RebootProgram=/sbin/reboot ReturnToService=0 #SallocDefaultCommand= SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/lib/slurm/slurmd SlurmUser=slurm #SrunEpilog= #SrunProlog= StateSaveLocation=/tmp/slurmctld SwitchType=switch/none #TaskEpilog= #TaskPlugin=task/none TaskPlugin=task/cgroup #TaskPluginParam= #TaskProlog= TopologyPlugin=topology/tree #TmpFs=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 UsePAM=1 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 HealthCheckInterval=300 HealthCheckProgram=/usr/sbin/nhc InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING DefMemPerCPU=4096 FastSchedule=1 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SchedulerPort=7321 #SelectType=select/linear SelectType=select/cons_res #SelectTypeParameters= SelectTypeParameters=CR_CPU_Memory #SelectTypeParameters=CR_CPU # # # JOB PRIORITY #PriorityType=priority/basic PriorityType=priority/multifactor PriorityDecayHalfLife=21-0 #PriorityCalcPeriod= PriorityFavorSmall=NO #PriorityMaxAge= #PriorityUsageResetPeriod= PriorityWeightAge=1000 PriorityWeightFairshare=8000 PriorityWeightJobSize=4000 PriorityWeightPartition=5000 PriorityWeightTRES=GRES/gpu=7000,GRES/mic=7000 #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 AccountingStorageEnforce=associations,limits,qos AccountingStorageHost=squid AccountingStorageLoc=/var/log/slurm/jobacctstor #AccountingStoragePass= #AccountingStoragePort= AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageUser= AccountingStorageTRES=gres/gpu,gres/mic AccountingStoreJobComment=YES ClusterName=amarel #DebugFlags= #DebugFlags=Gres #JobCompHost= JobCompLoc=/var/log/slurm/jobcomp.log #JobCompPass= #JobCompPort= JobCompType=jobcomp/filetxt #JobCompUser= JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux ### #JobAcctGatherType=jobacct_gather/cgroup SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmSchedLogFile=/var/log/slurm/slurmsched.log SlurmSchedLogLevel=1 # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # Preemption PreemptMode=REQUEUE #PreemptMode=CANCEL PreemptType=preempt/partition_prio #SchedulerParameters=preempt_youngest_first # # GPU Nodes GresTypes=gpu,mic Nodename=DEFAULT Sockets=2 ThreadsPerCore=1 State=UNKNOWN # LOGIN NODES #Nodename=amarel[1-2] RealMemory=128000 CoresPerSocket=14 ThreadsPerCore=2 #Nodename=fen[1-2] RealMemory=128000 CoresPerSocket=8 ThreadsPerCore=2 # COMPUTE NODES NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8 NodeName=slepner[054-058] Weight=4 Feature=ivybridge,fdr RealMemory=128903 CoresPerSocket=10 NodeName=slepner[059-084] Weight=6 Feature=haswell,fdr RealMemory=128817 CoresPerSocket=12 NodeName=slepner[085-088] Weight=8 Feature=broadwell,fdr RealMemory=128817 CoresPerSocket=14 NodeName=gpu[001-003] Weight=10 Feature=sandybrige,fdr,tesla RealMemory=64391 CoresPerSocket=6 Gres=gpu:8 NodeName=gpu[004] Weight=10 Feature=sandybridge,fdr,xeonphi RealMemory=64391 CoresPerSocket=6 Gres=mic:8 NodeName=gpu[005-006] Weight=10 Feature=broadwell,fdr,maxwell RealMemory=128839 CoresPerSocket=14 Gres=gpu:4 NodeName=hal[0001-0032,0053-0072] Weight=12 Feature=broadwell,edr RealMemory=128190 CoresPerSocket=14 NodeName=hal[0033-0052] Weight=14 Feature=broadwell,edr RealMemory=257214 CoresPerSocket=14 NodeName=pascal[001-004] Weight=16 Feature=broadwell,edr,pascal RealMemory=128190 CoresPerSocket=14 Gres=gpu:2 NodeName=mem[001-002] Weight=18 Feature=broadwell,edr RealMemory=1500000 Sockets=4 CoresPerSocket=12 #NodeName=slepnert001 CPUS=4 RealMemory=2001 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN # PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01 PartitionName=main Nodes=ALL QOS=main Default=YES PartitionName=bg Nodes=ALL PriorityTier=1 AllowGroups=hpctech QOS=bg PartitionName=oarc Nodes=ALL PriorityTier=40 AllowGroups=oarc QOS=bg PartitionName=gpu Nodes=pascal[001-004],gpu[001-006] PriorityTier=20 QOS=gpu PartitionName=mem Nodes=mem[001-002] PriorityTier=20 #PartitionName=admin Nodes=slepnert001 DefaultTime=30 MaxTime=120 AllowAccounts=oirt # owner partitions # hal0001-0032 & hal0053-72 are 128 gig memory nodes # hal0033-52 are 256 gig memory nodes PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas PartitionName=mp1009_1 Nodes=slepner[003-009,048] PriorityTier=40 AllowGroups=mp1009_1 MaxTime=14-0 QOS=mp1009_1 #PartitionName=sdk94_1 Nodes=slepner[010-012,048] PriorityTier=40 AllowGroups=sdk94_1 MaxTime=14-0 QOS=sdk94_1 #PartitionName=ab1337_2 Nodes=slepner[013-027,048] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 #PartitionName=rk509_1 Nodes=slepner[028-030,048] PriorityTier=40 AllowGroups=rk509_1 MaxTime=14-0 QOS=rk509_1 #PartitionName=ll502_1 Nodes=slepner[031,048] PriorityTier=40 AllowGroups=ll502_1 MaxTime=14-0 QOS=ll502_1 #PartitionName=rs1032_1 Nodes=slepner[032-042,048] PriorityTier=40 AllowGroups=rs1032_1 MaxTime=14-0 QOS=rs1032_1 #PartitionName=cs_1 Nodes=slepner[043,048] PriorityTier=40 AllowGroups=cs_1 MaxTime=14-0 QOS=cs_1 #PartitionName=ccb_2 Nodes=slepner[044,048] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=waksman_1 Nodes=slepner[045,048] PriorityTier=40 AllowGroups=waksman_1 MaxTime=14-0 QOS=waksman_1 PartitionName=tongz_1 Nodes=slepner[046,048] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_1 #PartitionName=ccib_1 Nodes=slepner[054,058] PriorityTier=40 AllowGroups=ccib_1 MaxTime=14-0 QOS=ccib_1 #PartitionName=ccb_3 Nodes=slepner[055-056,058] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=pl314_1 Nodes=slepner[059,084] PriorityTier=40 AllowGroups=pl314_1 MaxTime=14-0 QOS=pl314_1 #PartitionName=ccb_4 Nodes=slepner[060-061,084] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=mischaik_1 Nodes=slepner[062-074,084] PriorityTier=40 AllowGroups=mischaik_1 MaxTime=14-0 QOS=mischaik_1 #PartitionName=ab1337_2 Nodes=slepner[075-077,084] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 #PartitionName=cqb_1 Nodes=slepner[078-079,084] PriorityTier=40 AllowGroups=cqb_1 MaxTime=14-0 QOS=cqb_1 #PartitionName=sci_1 Nodes=slepner[080,084] PriorityTier=40 AllowGroups=sci_1 MaxTime=14-0 QOS=sci_1 PartitionName=tongz_2 Nodes=gpu[005-006] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_2 # PartitionName=luwang_1 Nodes=hal[0001-0009,0072] PriorityTier=40 AllowGroups=luwang_1 MaxTime=14-0 QOS=luwang_1 PartitionName=miller_1 Nodes=hal[0010,0072] PriorityTier=40 AllowGroups=miller_1 MaxTime=14-0 QOS=miller_1 PartitionName=bromberg_1 Nodes=hal[0011-0022,0072] PriorityTier=40 AllowGroups=bromberg_1 MaxTime=14-0 QOS=bromberg_1 PartitionName=mitrofanova_1 Nodes=hal[0023-0024,0072] PriorityTier=40 AllowGroups=mitrofanova MaxTime=14-0 QOS=mitrofanova_1 PartitionName=kopp_1 Nodes=hal[0025-0029,0072] PriorityTier=40 AllowGroups=kopp MaxTime=14-0 QOS=kopp_1 PartitionName=ccb_1 Nodes=hal[0030-0032,0072] PriorityTier=40 AllowGroups=ccb MaxTime=14-0 QOS=ccb_1 PartitionName=brzustowicz_1 Nodes=hal[0033-0036,0052] PriorityTier=40 AllowGroups=brzustowicz_1 MaxTime=14-0 QOS=brzustowicz_1 PartitionName=matise_1 Nodes=hal[0037,0052] PriorityTier=40 AllowGroups=matise_1 MaxTime=14-0 QOS=matise_1 PartitionName=xing_1 Nodes=hal[0038-0039,0052] PriorityTier=40 AllowGroups=xing_1 MaxTime=14-0 QOS=xing_1 PartitionName=ellison_1 Nodes=hal[0040,0052] PriorityTier=40 AllowGroups=ellison_1 MaxTime=14-0 QOS=ellison_1 PartitionName=hginj_1 Nodes=hal[0041-0044,0052] PriorityTier=40 AllowGroups=hginj_1 MaxTime=14-0 QOS=hginj_1 PartitionName=cgu_1 Nodes=hal[0045-0050,0052] PriorityTier=40 AllowGroups=cgu_1 MaxTime=14-0 QOS=cgu_1 PartitionName=genetics_1 Nodes=hal[0033-0050,0052] PriorityTier=30 AllowGroups=brzustowicz_1,matise_1,ellison_1,genetics_1,hginj_1,cgu_1 MaxTime=14-0 QOS=genetics_1 PartitionName=rshiroko_1 Nodes=hal[0051,0052] PriorityTier=40 AllowGroups=rshiroko_1 MaxTime=14-0 QOS=rshiroko_1 PartitionName=ab1337_1 Nodes=slepner[003,008] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 PartitionName=jdb252_1 Nodes=hal[0053,0072] PriorityTier=40 AllowGroups=jdb252_1 MaxTime=14-0 QOS=jdb252_1 PartitionName=ecastner_1 Nodes=hal[0054,0072] PriorityTier=40 AllowGroups=ecastner_1 MaxTime=14-0 QOS=ecastner_1 PartitionName=alangold Nodes=hal[0055,0072] PriorityTier=40 AllowGroups=alangold_1 MaxTime=14-0 QOS=alangold_1 PartitionName=njms_genomics_1 Nodes=hal[0056-0058,0072] PriorityTier=40 AllowGroups=njms_genomics_1 MaxTime=14-0 QOS=njms_genomics_1 PartitionName=dmcs_1 Nodes=hal[0059-0060,0072] PriorityTier=30 AllowGroups=dmcs_1 MaxTime=14-0 QOS=dmcs_1 # dmcs_1 and jbrodie_x overlap, dif prio, dif group PartitionName=jbrodie_1 Nodes=hal[0059-0060,0072] PriorityTier=40 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1 # jbrodie_x overlap, priority different PartitionName=jbrodie_2 Nodes=hal[0059-0060,0072] PriorityTier=35 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1 # jbrodie_x overlap, group same PartitionName=jeehiun_1 Nodes=hal[0052,0072] PriorityTier=40 AllowGroups=jeehiun_1 MaxTime=14-0 QOS=jeehiun_1 # 256=0052,0052:128=0059-0060,0072 PartitionName=es901_1 Nodes=hal[0061,0072] PriorityTier=40 AllowGroups=es901_1 MaxTime=14-0 QOS=es901_1 # 256=0061,0052:128=0052,0072 PartitionName=jn511_1 Nodes=hal[0053-0056,0072] PriorityTier=40 AllowGroups=jn511_1 MaxTime=14-0 QOS=jn511_1 # 256=0053-0056,0052:128=0061,0072 # sacctmgr show assoc where cluster=amarel|egrep -v "(general|workshop)" Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin ---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- ------------- amarel root 1 normal amarel root root 1 normal amarel ab1337 1 normal amarel ab1337 ab1337 1 normal amarel ab1337 acw103 1 normal amarel ab1337 rss230 1 normal amarel alangold 1 normal amarel alangold alangold 1 normal amarel alangold fh204 1 normal amarel alangold kroghjes 1 normal amarel alangold sm1792 1 normal amarel amarel 1 normal amarel amitrofa 1 normal amarel amitrofa am2051 1 normal amarel amitrofa amitrofa 1 normal amarel amitrofa kd566 1 normal amarel amitrofa mcu17 1 normal amarel amitrofa nje17 1 normal amarel amitrofa sh1019 1 normal amarel amitrofa sp1388 1 normal amarel amitrofa zsb11 1 normal amarel brannigan 1 normal amarel brannigan sm1249 1 normal amarel ccb 1 normal amarel ccb an567 1 normal amarel ccb by122 1 normal amarel ccb jds375 1 normal amarel ccb jeehiun 1 normal amarel ccb jx112 1 normal amarel ccb kroghjes 1 normal amarel ccb mse48 1 normal amarel ccb mz325 1 normal amarel ccb nw187 1 normal amarel cee53 1 normal amarel cee53 cee53 1 normal amarel cee53 cpr74 1 normal amarel cqb 1 normal amarel cqb evgeni 1 normal amarel dmcs_rtwrf 1 normal amarel ecastner 1 normal amarel ecastner bw194 1 normal amarel ecastner ecastner 1 normal amarel ecastner jds375 1 normal amarel ecastner mse48 1 normal amarel ecastner mz325 1 normal amarel es901 1 normal amarel es901 es901 1 normal amarel genetics 1 normal amarel genetics ak917 1 normal amarel genetics azaro 1 normal amarel genetics cpr74 1 normal amarel genetics kcchen 1 normal amarel genetics vm379 1 normal amarel hberman 1 normal amarel jaytisch 1 normal amarel jbrodie 1 normal amarel jbrodie belmonte 1 normal amarel jbrodie ccamastr 1 normal amarel jbrodie jbrodie 1 normal amarel jbrodie rjdave 1 normal amarel jbrodie tnmiles 1 normal amarel jdb252 1 normal amarel jdb252 jdb252 1 normal amarel jdb252 sm1249 1 normal amarel jdb252 tj227 1 normal amarel jeehiun 1 normal amarel jeehiun aek119 1 normal amarel jeehiun jeehiun 1 normal amarel jeehiun jx112 1 normal amarel jeehiun ksg80 1 normal amarel jeehiun linphi 1 normal amarel jeehiun nw187 1 normal amarel jeehiun yn81 1 normal amarel jn511 1 normal amarel jn511 jn511 1 normal amarel jx76 1 normal amarel jx76 jx76 1 normal amarel lbrz 1 normal amarel lbrz azaro 1 normal amarel lbrz vm379 1 normal amarel lw506 1 normal amarel lw506 gd342 1 normal amarel lw506 jd1308 1 normal amarel lw506 lw506 1 normal amarel lw506 sz398 1 normal amarel lw506 yj231 1 normal amarel lw506 yw594 1 normal amarel matise 1 normal amarel matise matise 1 normal amarel njms_geno+ 1 normal amarel njms_geno+ clcgs 1 normal amarel njms_geno+ dupe 1 normal amarel njms_geno+ ghannysa 1 normal amarel njms_geno+ husainse 1 normal amarel njms_geno+ kevina 1 normal amarel njms_geno+ soteropa 1 normal amarel njms_geno+ yc759 1 normal amarel oarc 1 normal amarel oarc babbott 1 bg,normal,sas amarel oarc dupe 1 normal amarel oarc ericmars 1 normal amarel oarc gc563 1 normal amarel oarc jbv9 1 normal amarel oarc jpc303 1 normal amarel oarc kevina 1 normal amarel oarc kholodvl 1 normal amarel oarc michelso 1 normal amarel oarc novosirj 1 normal amarel oarc ts840 1 normal amarel oarc yc759 1 normal amarel oirt 1 normal amarel oirt ericmars 1 normal amarel oirt kevina 1 normal amarel oirt pl427 1 normal amarel rk509 1 normal amarel rk509 ea289 1 normal amarel rs1032 1 normal amarel rs1032 ec675 1 normal amarel rs1032 rs1032 1 normal amarel rs1032 sss274 1 normal amarel rshiroko 1 normal amarel rshiroko ak1511 1 normal amarel rshiroko rshiroko 1 normal amarel sas 1 normal amarel sas babbott 1 normal amarel smiller 1 normal amarel smiller sdmiller 1 normal amarel soe 1 normal amarel tongz 1 normal amarel tongz rj254 1 normal amarel yanab 1 normal amarel yanab am2260 1 normal amarel yanab ap1397 1 normal amarel yanab chengzhu 1 normal amarel yanab cmm591 1 normal amarel yanab nal115 1 normal amarel yanab yanab 1 normal amarel yanab ym277 1 normal amarel yanab yw410 1 normal amarel yanab zz109 1 normal amarel york 1 normal amarel york ag1508 1 normal amarel york giambasu 1 normal I think we understand your problem: you want to implement the idea of having a pool of nodes, and then assign a maximum usage of X nodes to each of your groups. Then, when a user of a particular group starts using a node, you just want to make this node exclusive for their group. Finally, when a job finishes, exclusivity is ended and the node becomes available again in the pool. This feature is currently not in Slurm and I see the point in your first comment about GrpTRES=nodes=X. By now you have this options: - Give each partition separate nodes as you are doing. - Create reservations for each group, its a bit more flexible than modifying slurm.conf each time. - Change the concept of 'owning nodes' to 'owning cores', so you will end up with the solution proposed by Dominik and Tim of using GrpTRES=cpus. Which is the specific/technical reason that makes you not considering this third option of dealing with a pool of cores instead of a pool of nodes? If this options are not a solution for you I fear that this ticket should be marked as an Enhancement. Felix, You have the situation correct; that is what we want to do. The reason is that a PI who goes out and buys their own small cluster will get guaranteed immediate access to the nodes at any time, and that will outweigh all of the excellent reasons why they shouldn't do that. By giving them 10 nodes that they can immediately access (via preemption, without potentially waiting for max walltime), we neutralize that argument. How would the reservations work exactly? I assume the reservation would specify the exact 5 nodes rather than the whole pool, but how would general access users use those nodes when idle? Enhancement seems like the right Importance. Thanks, Bill (In reply to Bill Abbott from comment #15) > Felix, > > You have the situation correct; that is what we want to do. The reason is > that a PI who goes out and buys their own small cluster will get guaranteed > immediate access to the nodes at any time, and that will outweigh all of the > excellent reasons why they shouldn't do that. > > By giving them 10 nodes that they can immediately access (via preemption, > without potentially waiting for max walltime), we neutralize that argument. > I suppose then that not giving PI 'N exclusive cores' and rather giving '10 exclusive nodes' is more of an aesthetic reason, but at some point understandable. > How would the reservations work exactly? I assume the reservation would > specify the exact 5 nodes rather than the whole pool, but how would general > access users use those nodes when idle? Well, it does note provide anything different to the first option and in fact I see now that you are overlapping partitions, so just forget this option. If you wouldn't overlap you would have a reservation for a specific account/s just like you have a partition with AllowGroups. > Enhancement seems like the right Importance. > Marking it as an enhancement. > Thanks, > > Bill Another option might be to give partitions MaxNodesPerGroup, MaxNodesPerAccount, MaxNodesPerPartition or something like that. MaxNodes looks like it's actually MaxNodesPerJob, but works properly with fitting small jobs into the right number of whole nodes. |