We have a small cluster with 4 compute nodes, each with 24 cores. The jobs run on this cluster are all single-core jobs, and most users do not adjust the walltime from the default of 3 hours. Jobs typically take 1-2 hours to complete. Lately a single user has been submitting hundreds of jobs and causing others to have to wait days to get their jobs run. Obviously this has caused some users to be a bit disgruntled. Please provide a configuration to allow jobs for new users/jobs to get higher priority than existing pending jobs for users that currently have active jobs. Thanks. Clay Fandre
There are a few options available to you. You can set a max runnable at any time for the user, or push down their job's priority based on their job's usage/consumption. Would you please attach your current slurm.conf so that we can review what you have configured? [1] https://slurm.schedmd.com/priority_multifactor.html#fairshare [2] https://slurm.schedmd.com/sacctmgr.html#OPT_FairShare= [3] https://slurm.schedmd.com/sacctmgr.html#SECTION_EXAMPLES [4] https://slurm.schedmd.com/sacctmgr.html#OPT_GrpTRESRunMins
[root@asic-az97n204 slurm]# cat slurm.conf # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # # $Revision: 110 $ # ClusterName=asic-az97-hpc SlurmctldHost=asic-az97n204 #SlurmctldHost= # #DisableRootJobs=NO #EnforcePartLimits=NO #Epilog= #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=67043328 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=lua #KillOnBadExit=0 LaunchParameters=use_interactive_step #LaunchType=launch/slurm #Licenses=foo*4,bar #MailProg=/bin/mail #MaxJobCount=10000 #MaxStepCount=40000 #MaxTasksPerNode=512 MpiDefault=none #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/cgroup #Prolog= #PrologFlags= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #RebootProgram= ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root #SrunEpilog= #SrunProlog= SrunPortRange=60001-63000 StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none #TaskEpilog= TaskPlugin=task/affinity #TaskProlog= #TopologyPlugin=topology/tree #TmpFS=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING #DefMemPerCPU=0 #MaxMemPerCPU=0 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core # # # JOB PRIORITY #PriorityFlags= #PriorityType=priority/basic PriorityType=priority/multifactor #PriorityDecayHalfLife= #PriorityCalcPeriod= #PriorityFavorSmall= #PriorityMaxAge= #PriorityUsageResetPeriod= #PriorityWeightAge= PriorityWeightFairshare=10000 #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 AccountingStorageHost=asic-az97n204 #AccountingStoragePass= AccountingStoragePort=6819 AccountingStorageType=accounting_storage/slurmdbd #AccountingStorageUser= #AccountingStoreFlags= #JobCompHost= JobCompLoc=/var/log/slurm/slurm.jobcomp.log #JobCompPass= #JobCompPort= JobCompType=jobcomp/filetxt #JobCompUser= #JobContainerType=job_container/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=info SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurm/slurmd.log #SlurmSchedLogFile= #SlurmSchedLogLevel= #DebugFlags= # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=blazecomp[1-4] CPUs=24 RealMemory=257151 Sockets=1 CoresPerSocket=24 ThreadsPerCore=1 Weight=1 State=UNKNOWN NodeName=blazeuser[1-4] CPUs=24 RealMemory=257151 Sockets=1 CoresPerSocket=24 ThreadsPerCore=1 Weight=3 State=UNKNOWN PartitionName=asic Nodes=blazecomp[1-4] Default=YES MaxTime=1440 DefaultTime=180 State=UP
Hi Clay, It looks like you have multifactor priority enabled with Fairshare enabled as the only weight. PriorityType=priority/multifactor PriorityWeightFairshare=10000 This should be enough to get you started using Fairshare to prevent one user from dominating the queue. Did you recently enable these settings? When you have several jobs queued can you run sprio to see what it shows for the priority of the queued jobs? Thanks, Ben
Yes, I did just add those two options and haven't tested them yet as I wasn't sure that's the best way to do it. Unfortunately the queue seems to be empty now. I will do some testing when I can to simulate the jobs to see if it solves the problem. Clay
So some jobs were submitted, but sprio doesn't seem to be working. [root@asic-az97n204 ~]# squeue | head JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 573801 asic test_tc_ e503866 PD 0:00 1 (Resources) 573802 asic test_tt_ e503866 PD 0:00 1 (Priority) 573803 asic test_tt_ e503866 PD 0:00 1 (Priority) 573804 asic test_tt_ e503866 PD 0:00 1 (Priority) 573805 asic test_wdt e503866 PD 0:00 1 (Priority) 573806 asic test_wdt e503866 PD 0:00 1 (Priority) 573807 asic test_wdt e503866 PD 0:00 1 (Priority) 573808 asic test_wdt e503866 PD 0:00 1 (Priority) 573809 asic test_wdt e503866 PD 0:00 1 (Priority) [root@asic-az97n204 ~]# sprio JOBID PARTITION PRIORITY SITE FAIRSHARE [root@asic-az97n204 ~]# [root@asic-az97n204 ~]# /app/slurm/bin/showuserjobs Batch job status for cluster asic-az97-hpc at Fri Jun 30 16:18:26 MST 2023 Node states summary: allocated 4 nodes (100.00%) 96 CPUs (100.00%) Total 4 nodes (100.00%) 96 CPUs (100.00%) Job summary: 1391 jobs total (max=10000) in all partitions. Username/ Runnin Limit Pendin Totals Account Jobs CPUs CPUs Jobs CPUs Further info =========== ======= ====== ====== ====== ====== ====== ============================= ACCT_TOTAL (null) 96 96 Inf 1295 1295 Running+Pending=1391 CPUs, 3 users GRAND_TOTAL ALL 96 96 Inf 1295 1295 Running+Pending=1391 CPUs, 3 users e503866 (null) 96 96 Inf 18 18 h523709 (null) 0 0 Inf 1252 1252 h359520 (null) 0 0 Inf 25 25
Hi Clay, That is strange that sprio doesn't show anything for the queued jobs with the multifactor priority plugin enabled. Can I have you verify that it is recognized correctly by running: scontrol show config | grep -i priority If you have jobs queued right now I'd also like to see the show job output for one of them. Could you run the following command with the appropriate job id in place of <jobid>: scontrol show job <jobid> Thanks, Ben
[root@asic-az97n204 ~]# scontrol show config | grep -i priority PriorityParameters = (null) PrioritySiteFactorParameters = (null) PrioritySiteFactorPlugin = (null) PriorityDecayHalfLife = 7-00:00:00 PriorityCalcPeriod = 00:05:00 PriorityFavorSmall = No PriorityFlags = PriorityMaxAge = 7-00:00:00 PriorityUsageResetPeriod = NONE PriorityType = priority/multifactor PriorityWeightAge = 0 PriorityWeightAssoc = 0 PriorityWeightFairShare = 10000 PriorityWeightJobSize = 0 PriorityWeightPartition = 0 PriorityWeightQOS = 0 PriorityWeightTRES = (null) [root@asic-az97n204 ~]# sprio JOBID PARTITION PRIORITY SITE FAIRSHARE [root@asic-az97n204 ~]# squeue | wc 5084 40672 401643 From: bugs@schedmd.com <bugs@schedmd.com> Date: Monday, July 3, 2023 at 10:00 AM To: Fandre, Clay <clay.fandre@honeywell.com> Subject: [External] [Bug 17091] Require fairshare configuration to prevent one user from monopolizing queue You don't often get email from bugs@schedmd.com. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> WARNING: This message has originated from an External Source. This may be a phishing email that can result in unauthorized access to Honeywell systems. Please use proper judgment and caution when opening attachments, clicking links or responding. Comment # 7<https://bugs.schedmd.com/show_bug.cgi?id=17091#c7> on bug 17091<https://bugs.schedmd.com/show_bug.cgi?id=17091> from Ben Roberts<mailto:ben@schedmd.com> Hi Clay, That is strange that sprio doesn't show anything for the queued jobs with the multifactor priority plugin enabled. Can I have you verify that it is recognized correctly by running: scontrol show config | grep -i priority If you have jobs queued right now I'd also like to see the show job output for one of them. Could you run the following command with the appropriate job id in place of <jobid>: scontrol show job <jobid> Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug.
Thanks for verifying that the multifactor plugin is recognized correctly. Since that's the case can I have you send a copy of your slurm.conf along with any other conf files in the same directory? I would also still like to see the job details for a job that's pending: scontrol show job <jobid> Thanks, Ben
Created attachment 31053 [details] Slurm conf files Slurm conf files
Thanks for sending all your config files for me to review. I apologize that I forgot to have you remove the database password in your slurmdbd.conf file. I've marked the attachment as private now so that only SchedMD employees can view the file, but you will probably want to update your password. It's still not clear what might be preventing sprio from showing priority information about the queued jobs. I would like to have you enable debug logs that show information about priority calculations long enough to submit a test job. You can do this without restarting the cluster like this: scontrol setdebugflags +priority Once that is enabled you can submit a test job and then turn the debug flag back off like this: scontrol setdebugflags -priority Then if you would send the slurmctld log file I'll take a look at what's happening with the priority calculation. Thanks, Ben
So there were no jobs running this weekend so I stopped and restarted slurmctld and the slurmd's. sprio is now working. [root@asic-az97n204 slurm]# sprio JOBID PARTITION PRIORITY SITE FAIRSHARE 655117 asic 1 0 0 655118 asic 1 0 0 655119 asic 1 0 0 655120 asic 1 0 0 655121 asic 1 0 0 655122 asic 1 0 0 655123 asic 1 0 0 655124 asic 1 0 0
I'm glad that sprio is working after a restart. It doesn't show any fairshare priority for the jobs though. It's possible that these are all from a high utilization user/account, but I'd like to make sure. Can you also send the output from: squeue sshare -a Thanks, Ben
[root@asic-az97n204 slurm]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 655401 asic tp_max_p h523709 R 10:06 1 blazecomp2 655395 asic tp_max_p h523709 R 10:15 1 blazecomp2 652252 asic tp_ssp_c h508001 R 1:31:22 1 blazecomp4 652256 asic tp_ssp_c h508001 R 1:31:22 1 blazecomp4 652258 asic tp_ssp_c h508001 R 1:31:22 1 blazecomp2 [root@asic-az97n204 slurm]# sshare -a Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- root 0.000000 0 1.000000 root root 1 1.000000 0 0.000000 1.000000 [root@asic-az97n204 slurm]#
Thanks for sending that output. It looks like some of the initial configuration that needs to happen for Fairshare to work hasn't been done yet. My apologies that I didn't check for that earlier on. When you first configure a cluster you can add accounts and users to create a hierarchy that matches your internal organization. This allows you to track usage by different departments as well as for individual users. Once you create accounts you can create user associations, which are the combination of the cluster, account, username and optionally the partition they're allowed to use with that account. Here's an example of how that might look: $ sacctmgr show assoc tree format=cluster,account,user,partition Cluster Account User Partition ---------- -------------------- ---------- ---------- knight root knight root root knight a1 knight sub1 knight sub1 ben knight sub1 user1 knight sub1 user2 knight sub2 knight sub2 ben knight sub2 user2 knight sub3 knight sub3 ben knight sub3 user3 knight a2 knight sub4 knight sub4 ben knight sub4 user1 knight sub4 user4 knight sub5 knight sub5 user5 knight sub6 knight sub6 ben knight sub6 user1 knight sub6 user2 knight sub6 user3 You can see that there are 2 primary accounts; a1 and a2. Beneath the a1 account I have sub1, sub2 and sub3 accounts, each with different users. The a2 account similarly has different sub-accounts that are children of that account. I don't have a partition associated with any of these user associations, but that is an option as I mentioned. I'll show a few examples of how creating these accounts and user associations might look. To create the a1 account I would run this: sacctmgr add account a1 To create the sub1 account as a child of the a1 account I would run this: sacctmgr add account sub1 parent=a1 To create my user in the sub1 account I would run this: sacctmgr add user ben account=sub1 You can find more information on creating accounts and users as well as setting limits on those entities in the sacctmgr documentation. https://slurm.schedmd.com/sacctmgr.html These accounts and user associations have to exist in order for fairshare to track the usage of the different users and adjust the fairshare priority values accordingly. There is also an option to require that a user has to have been created with sacctmgr before they are able to submit jobs. Right now any user can submit a job to the system because it is not enforcing any kind of account hierarchy. Once you have the hierarchy created that you want you can enable the AccountingStorageEnforce option in your slurm.conf to turn that on. https://slurm.schedmd.com/slurm.conf.html#OPT_AccountingStorageEnforce Let me know if you have any questions about any of this configuration. Thanks, Ben
Ahhhh, ok. That makes sense. I went ahead and created the accounting data for all of the users. The queue is currently empty but once they submit jobs I will verify things are working.
Hi Clay, Were you able to verify that things work as expected after creating users with sacctmgr? Let me know if you still need help with this ticket or if it's ok to close. Thanks, Ben
I believe things are working as expected. Thanks for checking back and please feel free to close out this ticket. Clay From: bugs@schedmd.com <bugs@schedmd.com> Date: Monday, July 10, 2023 at 11:04 AM To: Fandre, Clay <clay.fandre@honeywell.com> Subject: [External] [Bug 17091] Require fairshare configuration to prevent one user from monopolizing queue WARNING: This message has originated from an External Source. This may be a phishing email that can result in unauthorized access to Honeywell systems. Please use proper judgment and caution when opening attachments, clicking links or responding. Comment # 17<https://bugs.schedmd.com/show_bug.cgi?id=17091#c17> on bug 17091<https://bugs.schedmd.com/show_bug.cgi?id=17091> from Ben Roberts<mailto:ben@schedmd.com> Hi Clay, Were you able to verify that things work as expected after creating users with sacctmgr? Let me know if you still need help with this ticket or if it's ok to close. Thanks, Ben ________________________________ You are receiving this mail because: * You reported the bug.
I'm glad to heard things are working. Closing now.