| Summary: | grptres limits for nodes/cores | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Bill Abbott <babbott> |
| Component: | Limits | Assignee: | Unassigned Developer <dev-unassigned> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | bart, felip.moll, patrice.peterson |
| Version: | 16.05.10 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=2580 | ||
| Site: | Rutgers | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | future | DevPrio: | 4 - Medium |
| Emory-Cloud Sites: | --- | ||
|
Description
Bill Abbott
2017-10-10 13:22:40 MDT
Hi Why you don't want to base this on cpus? GrpTRES=cpus=somenumber Dominik We guarantee owners immediate access to purchased nodes, and limiting by cpu would allow one owner to spread single-core jobs across all nodes. That would potentially block other owners for up to max walltime. Hi I am afraid we are not planing to change this GrpTRES=nodes behavior. If you have homogeneous nodes, usage of GrpTRES=cpus should be good choice. Scenario from comment 2 is generally not effective and shouldn't be used. Dominik Hi Could you give me more info? I need numbers of nodes, how many nodes do you want to give to each group Are your nodes homogeneous? What kind of benefit do you expect from using "floating partitions"? Dominik Sure. The nodes in this case are homogeneous and there are 52 of them. Most owners will have 1-5 nodes in their partition. We want to use floating partitions so we can bring nodes in and out of service without impacting users. The behavior right now is that an owner with 3 nodes (28 cores each) who runs 10 single-core jobs would end up with 3 single-core jobs running, 7 single-core job in the queue and 81 cores sitting idle. What we'd like instead is that those 10 single-core jobs get packed into those three nodes, no jobs end up in the queue, 74 cores still available. (In reply to Bill Abbott from comment #8) > Sure. The nodes in this case are homogeneous and there are 52 of them. > Most owners will have 1-5 nodes in their partition. We want to use floating > partitions so we can bring nodes in and out of service without impacting > users. > > The behavior right now is that an owner with 3 nodes (28 cores each) who > runs 10 single-core jobs would end up with 3 single-core jobs running, 7 > single-core job in the queue and 81 cores sitting idle. > > What we'd like instead is that those 10 single-core jobs get packed into > those three nodes, no jobs end up in the queue, 74 cores still available. Are the core counts the same across the nodes? If they are, limiting the TRES based on cpu count, and leaving the node count off, would be the simplest strategy at present. One other option, which I'm not sure if you've looked at, would be to use a PartitionQOS with MaxNodes=3. That should limit the Partition to using three nodes at a time. You do need to be careful how this interacts with normal QOS's though. Are you able to attach your current slurm.conf, and output from 'scontrol show assoc' ? thanks, - Tim The core counts are the same across this pool of nodes, but I don't see how that addresses the problem. If user 1 has 5 nodes but uses 6, then user 2 gets blocked from their node. That violates our SLA. How can we limit tres by cpu but ensure they all go to the same 5 nodes?
The partition option of MaxNodes looked perfect, but that apparently means "Max nodes per job", not "Max nodes per partition". We want an arbitrary number of jobs that can't spread past 5 nodes in any case.
I'll paste the slurm.conf as a different comment, but here are the relevant lines from a single user:
NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8
PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01
PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas
# sacctmgr show qos sas
Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES
---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- -------------
sas 0 00:00:00 cluster 1.000000 cpu=32
So what we really want is
PartitionName=sas Nodes=slepner[001-048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas
and
grptres=nodes=2
# slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=saul ControlAddr=saul #BackupController= #BackupAddr= # AuthType=auth/munge CacheGroups=0 #CheckpointType=checkpoint/none CryptoType=crypto/munge #DisableRootJobs=NO DisableRootJobs=YES #EnforcePartLimits=NO Epilog=/etc/slurm/slurm.epilog.clean #PrologSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 JobCheckpointDir=/var/lib/slurm/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 JobRequeue=0 #JobSubmitPlugins=1 #KillOnBadExit=0 #Licenses=foo*4,bar #MailProg=/usr/bin/mail #MaxJobCount=5000 #MaxStepCount=40000 #MaxTasksPerNode=128 #MpiDefault=none MpiDefault=none #MpiParams=ports=#-# MpiParams=ports=12000-12999 #PluginDir= #PlugStackConfig= #PrivateData=jobs #ProctrackType=proctrack/pgid #ProctrackType=proctrack/linuxproc ProctrackType=proctrack/cgroup #Prolog= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= PropagateResourceLimitsExcept=MEMLOCK RebootProgram=/sbin/reboot ReturnToService=0 #SallocDefaultCommand= SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/lib/slurm/slurmd SlurmUser=slurm #SrunEpilog= #SrunProlog= StateSaveLocation=/tmp/slurmctld SwitchType=switch/none #TaskEpilog= #TaskPlugin=task/none TaskPlugin=task/cgroup #TaskPluginParam= #TaskProlog= TopologyPlugin=topology/tree #TmpFs=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 UsePAM=1 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 HealthCheckInterval=300 HealthCheckProgram=/usr/sbin/nhc InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING DefMemPerCPU=4096 FastSchedule=1 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SchedulerPort=7321 #SelectType=select/linear SelectType=select/cons_res #SelectTypeParameters= SelectTypeParameters=CR_CPU_Memory #SelectTypeParameters=CR_CPU # # # JOB PRIORITY #PriorityType=priority/basic PriorityType=priority/multifactor PriorityDecayHalfLife=21-0 #PriorityCalcPeriod= PriorityFavorSmall=NO #PriorityMaxAge= #PriorityUsageResetPeriod= PriorityWeightAge=1000 PriorityWeightFairshare=8000 PriorityWeightJobSize=4000 PriorityWeightPartition=5000 PriorityWeightTRES=GRES/gpu=7000,GRES/mic=7000 #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 AccountingStorageEnforce=associations,limits,qos AccountingStorageHost=squid AccountingStorageLoc=/var/log/slurm/jobacctstor #AccountingStoragePass= #AccountingStoragePort= AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageUser= AccountingStorageTRES=gres/gpu,gres/mic AccountingStoreJobComment=YES ClusterName=amarel #DebugFlags= #DebugFlags=Gres #JobCompHost= JobCompLoc=/var/log/slurm/jobcomp.log #JobCompPass= #JobCompPort= JobCompType=jobcomp/filetxt #JobCompUser= JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux ### #JobAcctGatherType=jobacct_gather/cgroup SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd.log SlurmSchedLogFile=/var/log/slurm/slurmsched.log SlurmSchedLogLevel=1 # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # Preemption PreemptMode=REQUEUE #PreemptMode=CANCEL PreemptType=preempt/partition_prio #SchedulerParameters=preempt_youngest_first # # GPU Nodes GresTypes=gpu,mic Nodename=DEFAULT Sockets=2 ThreadsPerCore=1 State=UNKNOWN # LOGIN NODES #Nodename=amarel[1-2] RealMemory=128000 CoresPerSocket=14 ThreadsPerCore=2 #Nodename=fen[1-2] RealMemory=128000 CoresPerSocket=8 ThreadsPerCore=2 # COMPUTE NODES NodeName=slepner[001-048] Weight=2 Feature=sandybridge,fdr RealMemory=128903 CoresPerSocket=8 NodeName=slepner[054-058] Weight=4 Feature=ivybridge,fdr RealMemory=128903 CoresPerSocket=10 NodeName=slepner[059-084] Weight=6 Feature=haswell,fdr RealMemory=128817 CoresPerSocket=12 NodeName=slepner[085-088] Weight=8 Feature=broadwell,fdr RealMemory=128817 CoresPerSocket=14 NodeName=gpu[001-003] Weight=10 Feature=sandybrige,fdr,tesla RealMemory=64391 CoresPerSocket=6 Gres=gpu:8 NodeName=gpu[004] Weight=10 Feature=sandybridge,fdr,xeonphi RealMemory=64391 CoresPerSocket=6 Gres=mic:8 NodeName=gpu[005-006] Weight=10 Feature=broadwell,fdr,maxwell RealMemory=128839 CoresPerSocket=14 Gres=gpu:4 NodeName=hal[0001-0032,0053-0072] Weight=12 Feature=broadwell,edr RealMemory=128190 CoresPerSocket=14 NodeName=hal[0033-0052] Weight=14 Feature=broadwell,edr RealMemory=257214 CoresPerSocket=14 NodeName=pascal[001-004] Weight=16 Feature=broadwell,edr,pascal RealMemory=128190 CoresPerSocket=14 Gres=gpu:2 NodeName=mem[001-002] Weight=18 Feature=broadwell,edr RealMemory=1500000 Sockets=4 CoresPerSocket=12 #NodeName=slepnert001 CPUS=4 RealMemory=2001 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWN # PartitionName=DEFAULT PriorityTier=10 DefaultTime=2:00 MaxTime=3-0 State=UP AllocNodes=fen[1-2],amarel[1-2],helix,clcwb01 PartitionName=main Nodes=ALL QOS=main Default=YES PartitionName=bg Nodes=ALL PriorityTier=1 AllowGroups=hpctech QOS=bg PartitionName=oarc Nodes=ALL PriorityTier=40 AllowGroups=oarc QOS=bg PartitionName=gpu Nodes=pascal[001-004],gpu[001-006] PriorityTier=20 QOS=gpu PartitionName=mem Nodes=mem[001-002] PriorityTier=20 #PartitionName=admin Nodes=slepnert001 DefaultTime=30 MaxTime=120 AllowAccounts=oirt # owner partitions # hal0001-0032 & hal0053-72 are 128 gig memory nodes # hal0033-52 are 256 gig memory nodes PartitionName=sas Nodes=slepner[001-002,048] PriorityTier=40 AllowGroups=babbott MaxTime=14-0 QOS=sas PartitionName=mp1009_1 Nodes=slepner[003-009,048] PriorityTier=40 AllowGroups=mp1009_1 MaxTime=14-0 QOS=mp1009_1 #PartitionName=sdk94_1 Nodes=slepner[010-012,048] PriorityTier=40 AllowGroups=sdk94_1 MaxTime=14-0 QOS=sdk94_1 #PartitionName=ab1337_2 Nodes=slepner[013-027,048] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 #PartitionName=rk509_1 Nodes=slepner[028-030,048] PriorityTier=40 AllowGroups=rk509_1 MaxTime=14-0 QOS=rk509_1 #PartitionName=ll502_1 Nodes=slepner[031,048] PriorityTier=40 AllowGroups=ll502_1 MaxTime=14-0 QOS=ll502_1 #PartitionName=rs1032_1 Nodes=slepner[032-042,048] PriorityTier=40 AllowGroups=rs1032_1 MaxTime=14-0 QOS=rs1032_1 #PartitionName=cs_1 Nodes=slepner[043,048] PriorityTier=40 AllowGroups=cs_1 MaxTime=14-0 QOS=cs_1 #PartitionName=ccb_2 Nodes=slepner[044,048] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=waksman_1 Nodes=slepner[045,048] PriorityTier=40 AllowGroups=waksman_1 MaxTime=14-0 QOS=waksman_1 PartitionName=tongz_1 Nodes=slepner[046,048] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_1 #PartitionName=ccib_1 Nodes=slepner[054,058] PriorityTier=40 AllowGroups=ccib_1 MaxTime=14-0 QOS=ccib_1 #PartitionName=ccb_3 Nodes=slepner[055-056,058] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=pl314_1 Nodes=slepner[059,084] PriorityTier=40 AllowGroups=pl314_1 MaxTime=14-0 QOS=pl314_1 #PartitionName=ccb_4 Nodes=slepner[060-061,084] PriorityTier=40 AllowGroups=ccb_1 MaxTime=14-0 QOS=ccb_1 #PartitionName=mischaik_1 Nodes=slepner[062-074,084] PriorityTier=40 AllowGroups=mischaik_1 MaxTime=14-0 QOS=mischaik_1 #PartitionName=ab1337_2 Nodes=slepner[075-077,084] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 #PartitionName=cqb_1 Nodes=slepner[078-079,084] PriorityTier=40 AllowGroups=cqb_1 MaxTime=14-0 QOS=cqb_1 #PartitionName=sci_1 Nodes=slepner[080,084] PriorityTier=40 AllowGroups=sci_1 MaxTime=14-0 QOS=sci_1 PartitionName=tongz_2 Nodes=gpu[005-006] PriorityTier=40 AllowGroups=tongz_1 MaxTime=14-0 QOS=tongz_2 # PartitionName=luwang_1 Nodes=hal[0001-0009,0072] PriorityTier=40 AllowGroups=luwang_1 MaxTime=14-0 QOS=luwang_1 PartitionName=miller_1 Nodes=hal[0010,0072] PriorityTier=40 AllowGroups=miller_1 MaxTime=14-0 QOS=miller_1 PartitionName=bromberg_1 Nodes=hal[0011-0022,0072] PriorityTier=40 AllowGroups=bromberg_1 MaxTime=14-0 QOS=bromberg_1 PartitionName=mitrofanova_1 Nodes=hal[0023-0024,0072] PriorityTier=40 AllowGroups=mitrofanova MaxTime=14-0 QOS=mitrofanova_1 PartitionName=kopp_1 Nodes=hal[0025-0029,0072] PriorityTier=40 AllowGroups=kopp MaxTime=14-0 QOS=kopp_1 PartitionName=ccb_1 Nodes=hal[0030-0032,0072] PriorityTier=40 AllowGroups=ccb MaxTime=14-0 QOS=ccb_1 PartitionName=brzustowicz_1 Nodes=hal[0033-0036,0052] PriorityTier=40 AllowGroups=brzustowicz_1 MaxTime=14-0 QOS=brzustowicz_1 PartitionName=matise_1 Nodes=hal[0037,0052] PriorityTier=40 AllowGroups=matise_1 MaxTime=14-0 QOS=matise_1 PartitionName=xing_1 Nodes=hal[0038-0039,0052] PriorityTier=40 AllowGroups=xing_1 MaxTime=14-0 QOS=xing_1 PartitionName=ellison_1 Nodes=hal[0040,0052] PriorityTier=40 AllowGroups=ellison_1 MaxTime=14-0 QOS=ellison_1 PartitionName=hginj_1 Nodes=hal[0041-0044,0052] PriorityTier=40 AllowGroups=hginj_1 MaxTime=14-0 QOS=hginj_1 PartitionName=cgu_1 Nodes=hal[0045-0050,0052] PriorityTier=40 AllowGroups=cgu_1 MaxTime=14-0 QOS=cgu_1 PartitionName=genetics_1 Nodes=hal[0033-0050,0052] PriorityTier=30 AllowGroups=brzustowicz_1,matise_1,ellison_1,genetics_1,hginj_1,cgu_1 MaxTime=14-0 QOS=genetics_1 PartitionName=rshiroko_1 Nodes=hal[0051,0052] PriorityTier=40 AllowGroups=rshiroko_1 MaxTime=14-0 QOS=rshiroko_1 PartitionName=ab1337_1 Nodes=slepner[003,008] PriorityTier=40 AllowGroups=ab1337_1 MaxTime=14-0 QOS=ab1337_1 PartitionName=jdb252_1 Nodes=hal[0053,0072] PriorityTier=40 AllowGroups=jdb252_1 MaxTime=14-0 QOS=jdb252_1 PartitionName=ecastner_1 Nodes=hal[0054,0072] PriorityTier=40 AllowGroups=ecastner_1 MaxTime=14-0 QOS=ecastner_1 PartitionName=alangold Nodes=hal[0055,0072] PriorityTier=40 AllowGroups=alangold_1 MaxTime=14-0 QOS=alangold_1 PartitionName=njms_genomics_1 Nodes=hal[0056-0058,0072] PriorityTier=40 AllowGroups=njms_genomics_1 MaxTime=14-0 QOS=njms_genomics_1 PartitionName=dmcs_1 Nodes=hal[0059-0060,0072] PriorityTier=30 AllowGroups=dmcs_1 MaxTime=14-0 QOS=dmcs_1 # dmcs_1 and jbrodie_x overlap, dif prio, dif group PartitionName=jbrodie_1 Nodes=hal[0059-0060,0072] PriorityTier=40 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1 # jbrodie_x overlap, priority different PartitionName=jbrodie_2 Nodes=hal[0059-0060,0072] PriorityTier=35 AllowGroups=jbrodie_1 MaxTime=14-0 QOS=jbrodie_1 # jbrodie_x overlap, group same PartitionName=jeehiun_1 Nodes=hal[0052,0072] PriorityTier=40 AllowGroups=jeehiun_1 MaxTime=14-0 QOS=jeehiun_1 # 256=0052,0052:128=0059-0060,0072 PartitionName=es901_1 Nodes=hal[0061,0072] PriorityTier=40 AllowGroups=es901_1 MaxTime=14-0 QOS=es901_1 # 256=0061,0052:128=0052,0072 PartitionName=jn511_1 Nodes=hal[0053-0056,0072] PriorityTier=40 AllowGroups=jn511_1 MaxTime=14-0 QOS=jn511_1 # 256=0053-0056,0052:128=0061,0072 # sacctmgr show assoc where cluster=amarel|egrep -v "(general|workshop)"
Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- -------------- --------- ----------- ------------- -------------------- --------- -------------
amarel root 1 normal
amarel root root 1 normal
amarel ab1337 1 normal
amarel ab1337 ab1337 1 normal
amarel ab1337 acw103 1 normal
amarel ab1337 rss230 1 normal
amarel alangold 1 normal
amarel alangold alangold 1 normal
amarel alangold fh204 1 normal
amarel alangold kroghjes 1 normal
amarel alangold sm1792 1 normal
amarel amarel 1 normal
amarel amitrofa 1 normal
amarel amitrofa am2051 1 normal
amarel amitrofa amitrofa 1 normal
amarel amitrofa kd566 1 normal
amarel amitrofa mcu17 1 normal
amarel amitrofa nje17 1 normal
amarel amitrofa sh1019 1 normal
amarel amitrofa sp1388 1 normal
amarel amitrofa zsb11 1 normal
amarel brannigan 1 normal
amarel brannigan sm1249 1 normal
amarel ccb 1 normal
amarel ccb an567 1 normal
amarel ccb by122 1 normal
amarel ccb jds375 1 normal
amarel ccb jeehiun 1 normal
amarel ccb jx112 1 normal
amarel ccb kroghjes 1 normal
amarel ccb mse48 1 normal
amarel ccb mz325 1 normal
amarel ccb nw187 1 normal
amarel cee53 1 normal
amarel cee53 cee53 1 normal
amarel cee53 cpr74 1 normal
amarel cqb 1 normal
amarel cqb evgeni 1 normal
amarel dmcs_rtwrf 1 normal
amarel ecastner 1 normal
amarel ecastner bw194 1 normal
amarel ecastner ecastner 1 normal
amarel ecastner jds375 1 normal
amarel ecastner mse48 1 normal
amarel ecastner mz325 1 normal
amarel es901 1 normal
amarel es901 es901 1 normal
amarel genetics 1 normal
amarel genetics ak917 1 normal
amarel genetics azaro 1 normal
amarel genetics cpr74 1 normal
amarel genetics kcchen 1 normal
amarel genetics vm379 1 normal
amarel hberman 1 normal
amarel jaytisch 1 normal
amarel jbrodie 1 normal
amarel jbrodie belmonte 1 normal
amarel jbrodie ccamastr 1 normal
amarel jbrodie jbrodie 1 normal
amarel jbrodie rjdave 1 normal
amarel jbrodie tnmiles 1 normal
amarel jdb252 1 normal
amarel jdb252 jdb252 1 normal
amarel jdb252 sm1249 1 normal
amarel jdb252 tj227 1 normal
amarel jeehiun 1 normal
amarel jeehiun aek119 1 normal
amarel jeehiun jeehiun 1 normal
amarel jeehiun jx112 1 normal
amarel jeehiun ksg80 1 normal
amarel jeehiun linphi 1 normal
amarel jeehiun nw187 1 normal
amarel jeehiun yn81 1 normal
amarel jn511 1 normal
amarel jn511 jn511 1 normal
amarel jx76 1 normal
amarel jx76 jx76 1 normal
amarel lbrz 1 normal
amarel lbrz azaro 1 normal
amarel lbrz vm379 1 normal
amarel lw506 1 normal
amarel lw506 gd342 1 normal
amarel lw506 jd1308 1 normal
amarel lw506 lw506 1 normal
amarel lw506 sz398 1 normal
amarel lw506 yj231 1 normal
amarel lw506 yw594 1 normal
amarel matise 1 normal
amarel matise matise 1 normal
amarel njms_geno+ 1 normal
amarel njms_geno+ clcgs 1 normal
amarel njms_geno+ dupe 1 normal
amarel njms_geno+ ghannysa 1 normal
amarel njms_geno+ husainse 1 normal
amarel njms_geno+ kevina 1 normal
amarel njms_geno+ soteropa 1 normal
amarel njms_geno+ yc759 1 normal
amarel oarc 1 normal
amarel oarc babbott 1 bg,normal,sas
amarel oarc dupe 1 normal
amarel oarc ericmars 1 normal
amarel oarc gc563 1 normal
amarel oarc jbv9 1 normal
amarel oarc jpc303 1 normal
amarel oarc kevina 1 normal
amarel oarc kholodvl 1 normal
amarel oarc michelso 1 normal
amarel oarc novosirj 1 normal
amarel oarc ts840 1 normal
amarel oarc yc759 1 normal
amarel oirt 1 normal
amarel oirt ericmars 1 normal
amarel oirt kevina 1 normal
amarel oirt pl427 1 normal
amarel rk509 1 normal
amarel rk509 ea289 1 normal
amarel rs1032 1 normal
amarel rs1032 ec675 1 normal
amarel rs1032 rs1032 1 normal
amarel rs1032 sss274 1 normal
amarel rshiroko 1 normal
amarel rshiroko ak1511 1 normal
amarel rshiroko rshiroko 1 normal
amarel sas 1 normal
amarel sas babbott 1 normal
amarel smiller 1 normal
amarel smiller sdmiller 1 normal
amarel soe 1 normal
amarel tongz 1 normal
amarel tongz rj254 1 normal
amarel yanab 1 normal
amarel yanab am2260 1 normal
amarel yanab ap1397 1 normal
amarel yanab chengzhu 1 normal
amarel yanab cmm591 1 normal
amarel yanab nal115 1 normal
amarel yanab yanab 1 normal
amarel yanab ym277 1 normal
amarel yanab yw410 1 normal
amarel yanab zz109 1 normal
amarel york 1 normal
amarel york ag1508 1 normal
amarel york giambasu 1 normal
I think we understand your problem: you want to implement the idea of having a pool of nodes, and then assign a maximum usage of X nodes to each of your groups. Then, when a user of a particular group starts using a node, you just want to make this node exclusive for their group. Finally, when a job finishes, exclusivity is ended and the node becomes available again in the pool. This feature is currently not in Slurm and I see the point in your first comment about GrpTRES=nodes=X. By now you have this options: - Give each partition separate nodes as you are doing. - Create reservations for each group, its a bit more flexible than modifying slurm.conf each time. - Change the concept of 'owning nodes' to 'owning cores', so you will end up with the solution proposed by Dominik and Tim of using GrpTRES=cpus. Which is the specific/technical reason that makes you not considering this third option of dealing with a pool of cores instead of a pool of nodes? If this options are not a solution for you I fear that this ticket should be marked as an Enhancement. Felix, You have the situation correct; that is what we want to do. The reason is that a PI who goes out and buys their own small cluster will get guaranteed immediate access to the nodes at any time, and that will outweigh all of the excellent reasons why they shouldn't do that. By giving them 10 nodes that they can immediately access (via preemption, without potentially waiting for max walltime), we neutralize that argument. How would the reservations work exactly? I assume the reservation would specify the exact 5 nodes rather than the whole pool, but how would general access users use those nodes when idle? Enhancement seems like the right Importance. Thanks, Bill (In reply to Bill Abbott from comment #15) > Felix, > > You have the situation correct; that is what we want to do. The reason is > that a PI who goes out and buys their own small cluster will get guaranteed > immediate access to the nodes at any time, and that will outweigh all of the > excellent reasons why they shouldn't do that. > > By giving them 10 nodes that they can immediately access (via preemption, > without potentially waiting for max walltime), we neutralize that argument. > I suppose then that not giving PI 'N exclusive cores' and rather giving '10 exclusive nodes' is more of an aesthetic reason, but at some point understandable. > How would the reservations work exactly? I assume the reservation would > specify the exact 5 nodes rather than the whole pool, but how would general > access users use those nodes when idle? Well, it does note provide anything different to the first option and in fact I see now that you are overlapping partitions, so just forget this option. If you wouldn't overlap you would have a reservation for a specific account/s just like you have a partition with AllowGroups. > Enhancement seems like the right Importance. > Marking it as an enhancement. > Thanks, > > Bill Another option might be to give partitions MaxNodesPerGroup, MaxNodesPerAccount, MaxNodesPerPartition or something like that. MaxNodes looks like it's actually MaxNodesPerJob, but works properly with fitting small jobs into the right number of whole nodes. |