| Summary: | Understanding Priority for Pending Jobs | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Will French <will> |
| Component: | Scheduling | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | brian, da, simran |
| Version: | 14.11.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Vanderbilt | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
squeue output
slurm.conf output from sprio output from squeue --start squeue --start 2nd output from sprio sdiag output associations qos Users sdiag output |
||
I'm escalating this ticket to high impact because we have several users whose jobs are not being scheduled as needed. From doing some digging, it appears that the problem might be arising from one of our users (glow) who has 172 jobs pending due to AssocGrpCPURunMinsLimit. All of these pending jobs have a higher priority than the other jobs in the queue and appear to be blocking other jobs from beginning....at least that's our best guess. We currently have 1092 jobs pending due to "Priority" but 143 completely idle nodes. Help (even if a quick temporary fix!) would be greatly appreciated! I've tried bumping up account's fairshare to arbitrarily high values but this only appears to help for new jobs, and not jobs that are already queued.
Hello,
we analyzed the data you sent us and we have couple of initial suggestions
that could improve the throughput:
1) Configure SchedulerParameters for backfill. By default backfill only looks
at 100 pending jobs so we can increase this to 500.
SchedulerParameters=bf_max_job_test=500
then bf_max_job_user to have backfill to only check a limited number of jobs
for every user instead of all.
SchedulerParameters=bf_max_job_test=500,bf_max_job_user=10
For other parameters see:
http://slurm.schedmd.com/slurm.conf.html
2) Your PriorityWeightFairShare gives fairshare the same weight as other
parameters so fairshare is not playing a major role in determining jobs
priorities. We suggest you to increase the value 100 to 10000 so fairshare
will be a dominant factor in priority calculation.
You can also see the output of squeue with the %S format which will give you the expected start time of the pending jobs.
david@prometeo ~/slurm/work $ squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %S"
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) START_TIME
6778_[5-20] markab sleepme david PD 0:00 1 (Resources) 2016-01-21T10:22:29
6778_1 markab sleepme david R 1:42 1 prometeo 2015-01-21T10:22:29
A recommended tutorial about how to tune scheduling can be found here:
http://slurm.schedmd.com/SUG14/sched_tutorial.pdf
David
Hi David, Thanks for the response. We weren't aware of all these backfill parameters, so we have started tweaking them here and there. Your suggestions seemed to help but we are still seeing some oddities that make me think there is an additional parameter that needs to be adjusted. Reading through the SchedulerParameters section, nothing jumps out. We have set bf_max_job_test=1000 and bf_max_job_user=50. Currently we have 815 jobs in the pending state: [frenchwr@vmps65 slurm]$ squeue --states=pending | wc -l 815 Many of these are pending due to group limits, but several are also pending due to low priority. For instance, I submitted a short Hello World batch script about half an hour ago that has still not run, even though there are 111 nodes that are completely idle: [frenchwr@vmps65 slurm]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST production* up 14-00:00:0 185 mix vmp[101,105,108,112-114,117-118,120,301,303,305-308,310-313,318-319,321-322,325-326,328-330,332-346,349,351-377,379-380,502-505,508,510-511,513,515-517,519,523,527,529,533,610-648,652-653,659-662,664-690,1001-1003,1041-1044,1047-1053,1082-1083,1086,1088-1093,1095] production* up 14-00:00:0 35 alloc vmp[102-103,106-107,109-110,115-116,119,302,304,309,314-317,320,323-324,327,331,347-348,350,378,506-507,509,609,1046,1054,1081,1084,1087,1094] production* up 14-00:00:0 111 idle vmp[512,514,518,520-522,524-526,528,530-532,534-548,552-574,602-608,1007-1039,1045,1055-1059,1061-1073,1085] gpu up 14-00:00:0 3 down* vmp[810,814,839] gpu up 14-00:00:0 8 alloc vmp[801-803,805-809] gpu up 14-00:00:0 27 idle vmp[811-813,815-819,821-838,840] In fact, it appears that SLURM has not even attempted to backfill schedule my job. I gather this from trying to get the projected start time of the job, which (from http://slurm.schedmd.com/slurm.conf.html) is supposed to have a value assigned if SLURM has attempted to schedule it: [frenchwr@vmps65 slurm]$ squeue --start | head -n 1 JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON) [frenchwr@vmps65 slurm]$ squeue --start | grep frenchwr 19253 productio testjob5 frenchwr PD N/A 1 (null) (Priority) It's unclear why SLURM has not attempted to schedule this job if there are a total of 814 pending jobs and bf_max_job_test=1000. Additionally, with bf_max_job_user=50 shouldn't this ensure that SLURM attempts to backfill schedule up to 50 of my pending jobs? Are we missing something? Can you please send us your slurm.conf, output of sprio and squeue --start for all jobs. Do the jobs use the runtime limit? David Created attachment 1568 [details]
slurm.conf
Created attachment 1569 [details]
output from sprio
Created attachment 1570 [details]
output from squeue --start
Not sure what you mean when you say runtime limit. Do you mean the default walltime? If so, we have a 15 minute default set for both our partitions. Hi, yes I meant walltime, we see indeed 15 as default and 20160 as max. After you changed the reconfiguration did you restart slurmctld and the slurmds? The sprio command still shows the fairshare priority as 0 and most jobs don't have predicted startime. Also please add bf_continue to the SchedulerParameters. SchedulerParameters=bf_max_job_test=1000,bf_max_job_user=50,bf_continue so the backfill scheduler will continue from where it stopped last time instead of restarting from the top of the queue. David Yes, we did a: service slurmctl restart and scontrol reconfigure but I did it again just to be sure. I waited for several minutes and no changes to the queue. However, adding the bf_continue parameter appears to be helping. More jobs are being scheduled, but the rate at which previously "Priority" classified jobs are getting a projected start time and then starting up is slow. When I submit a very small test script (15 minutes walltime), the job takes awhile (more than 10 minutes) to get scheduled and run. Is this expected? It seems like the job is not being considered for backfill scheduling for an prolonged period of time.. Is the number of jobs with N/A decreasing now? Can we see again please the output of squeue --start, sprio and this time also sdiag. David Created attachment 1571 [details]
squeue --start
Created attachment 1572 [details]
2nd output from sprio
Created attachment 1573 [details]
sdiag output
Yes, the number of lines with N/A decreased gradually. More jobs are getting submitted so it's a little hard to gauge the affect exactly. Note that I also decreased bf_interval to 5 and bf_max_job_user to 10. Thanks for the data. We are having our scheduling developer to analyzing the data. Meanwhile I would like to reproduce the AssocGrpMemoryLimit pending jobs in your cluster to see if it affects things in any way. Could you please send us the output of: 'sacctmgr show assoc' 'sacctmgr show qos' 'sacctmgr show users' Thanks, David Created attachment 1574 [details]
associations
Created attachment 1575 [details]
qos
Created attachment 1576 [details]
Users
Thanks, David. We're not as concerned about the jobs being blocked due to AssocGrpMemoryLimit. We have that configured in the SLURM db, as you'll see. Our concern is with low priority jobs and them being scheduled slowly when lots of nodes are idle. Yes I understand. I am just trying to see if that may be somehow related. David All of these jobs hitting association group limits count against the 1000 jobs that the backfill scheduler looks at ("bf_max_job_test=1000"). Since you have over 1000 pending jobs, the scheduler is definitely not be looking at them all.
Running the backfill scheduler really frequently, unless there is a lot of churn in your jobs, is just going to waste time ("bf_interval=5").
This is what we have at Harvard U, which works well for their workload:
SchedulerParameters=bf_interval=600,bf_continue,bf_resolution=300,max_job_bf=5000,bf_max_job_part=5000,bf_max_job_user=100
This is what I would recommend for you, a slight variation of the above:
SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=300,max_job_bf=5000,bf_max_job_user=100
Notes:
bf_interval=60 Running the backfill scheduler once a minute is probably sufficient in most cases
bf_continue Needed for any system with large job counts
bf_resolution=300 Decreases backfill scheduler overhead
max_job_bf=5000 You want to test most if not all jobs
bf_max_job_user=100 Don't spend too much time on any single user
I have other recommendations for your configuration, but let's get jobs running first.
Thanks for the reply, Joe. Things improved considerably starting last night and through this morning. Of our now ~350 non-GPU nodes, none are currently completely idle. I will make the changes you suggested and continue monitoring queued jobs to see how they respond. (In reply to Will French from comment #23) > Thanks for the reply, Joe. > > Things improved considerably starting last night and through this morning. > Of our now ~350 non-GPU nodes, none are currently completely idle. I will > make the changes you suggested and continue monitoring queued jobs to see > how they respond. Actually, it's "Moe". If you are going to be making configuration changes, here are some other suggestions: DebugFlags=NO_CONF_HASH This disables testing that your configuration files are consistent across the cluster. This may be fine if you are in the process of tuning scheduling, but in general is a bad idea. If your configurations get out of sync across the cluster, very difficult to diagnose communication problems could occur, so I would recommend removing it. JobCompType=jobcomp/mysql This is storing redundant accounting information already being stored in the slurmdbd (AccountingStorageType=accounting_storage/slurmdbd) and should be removed. MaxJobCount=10000 You might want to bump this up. PriorityWeight* Consider how you want to prioritize the workload. Weighting all of the factors the same is probably not really what you want. SelectTypeParameters=CR_CORE_MEMORY Are you setting default memory limits in the partition configuration? SlurmctldDebug=debug3 SlurmdDebug=debug2 These is really verbose, to the point of likely impacting performance. Moe, not Joe. Sorry about that! I went ahead and implemented the first set of changes, I'll wait several hours to see what sort of affect these changes have then implement the next set of recommended changes. One clarification, I'm assuming when you wrote "max_job_bf=5000" you meant "bf_max_job_test=5000"? (In reply to Will French from comment #25) > Moe, not Joe. Sorry about that! > > I went ahead and implemented the first set of changes, I'll wait several > hours to see what sort of affect these changes have then implement the next > set of recommended changes. One clarification, I'm assuming when you wrote > "max_job_bf=5000" you meant "bf_max_job_test=5000"? Sorry, "max_job_bf=5000" is the old form of "bf_max_job_test=5000", but both work. I added a bunch of backfill scheduling parameters and wanted them all to start with "bf_" for better clarity. A few hours later and scheduling is much better than yesterday. Thank you. > SelectTypeParameters=CR_CORE_MEMORY > Are you setting default memory limits in the partition configuration? Do you mean the DefMemPerCPU option in slurm.conf? If so, no, we have not configured that option yet. Or do you mean listing the RealMemory option on the NodeName lines? If so, yes, we do that (usually by just logging into the node and running free -g...we then let SLURM drain any nodes that have missing memory and we then reduce the RealMemory value down accordingly) > SlurmctldDebug=debug3 > SlurmdDebug=debug2 > These is really verbose, to the point of likely impacting performance. I set both of these to 3. SlurmctldDebug=3 SlurmdDebug=3 (In reply to Will French from comment #27) > > SelectTypeParameters=CR_CORE_MEMORY > > Are you setting default memory limits in the partition configuration? > > Do you mean the DefMemPerCPU option in slurm.conf? If so, no, we have not > configured that option yet. Or do you mean listing the RealMemory option on > the NodeName lines? If so, yes, we do that (usually by just logging into the > node and running free -g...we then let SLURM drain any nodes that have > missing memory and we then reduce the RealMemory value down accordingly) The issue is that Slurm is configured to allocate memory to jobs, but without defining DefMemPerCPU/Node and MaxMemPerCPU/Node on a system wide or per partition/queue basis, Slurm has no information as to how much memory each job should be allocated. You might also use a job_submit plugin for this purpose as was discussed earlier today on the slurm-dev mailing list, but that would be a more complex approach that I would not recommend at this point. Okay, for our CPU-only partition I set: DefMemPerCPU=1000 DefMemPerNode=2000 MaxMemPerCPU=15375 MaxMemPerNode=123000 The GPU partition is similar, only with different Max values based on the amount of RAM available on these nodes: MaxMemPerCPU=5625 MaxMemPerNode=45000 (In reply to Will French from comment #29) > Okay, for our CPU-only partition I set: > > DefMemPerCPU=1000 > DefMemPerNode=2000 > MaxMemPerCPU=15375 > MaxMemPerNode=123000 > > The GPU partition is similar, only with different Max values based on the > amount of RAM available on these nodes: > > MaxMemPerCPU=5625 > MaxMemPerNode=45000 See "man slurm.conf": DefMemPerCPU and DefMemPerNode are mutually exclusive. MaxMemPerCPU and MaxMemPerNode are mutually exclusive. Pick one or the other. How are things running now? Can I downgrade the severity of this bug or close it? Yes, you can close this ticket. We're happy with scheduling at this point and have a much better understanding of which knobs to turn for tuning. Thank you for all the assistance. Will Closed per client request. I'm re-opening this ticket because we're experiencing some issues that appear to be related to our backfill parameters. Our users have reported seeing messages like the following when attempting to run sbatch, squeue, or other SLURM commands: sbatch: error: slurm_receive_msg: Socket timed out on send/recv operation sbatch: error: Batch job submission failed: Socket timed out on send/recv operation We gather that this is related to the SLURM controller being busy due to computationally demanding tasks within the backfill algorithm. We don't consider this to be a big problem for users submitting from the command line, but many of our users have automated pipelines that are failing because of this error. At the moment we are using: SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=300,bf_max_job_test=5000,bf_max_job_user=300 Our general understanding is that we can improve responsiveness by increasing bf_interval and decreasing the other parameters. Is responsiveness most sensitive to one of these parameters more so than the others? Would you recommend we try the "defer" option? Relatedly, could you explain what the process is for a new job that is submitted to the scheduler? I had assumed that SLURM only attempted to schedule jobs every bf_interval (60 in our case) seconds. So if the controller attempted to schedule jobs at 2 AM (2:00:00), and a job is submitted at 2:00:01, SLURM would not attempt to schedule that job until 2:01:00. The "defer" option leads me to believe that my understanding of how the backfill algorithm works is wrong. One last thing, bf_window should be in minutes, correct? That's what I see in old documentation (https://computing.llnl.gov/linux/slurm/slurm.conf.html) but docs on the SchedMD site does not specify units (http://slurm.schedmd.com/sched_config.html). Could increasing bf_window improve responsiveness? I understand that it will increase the computational load when initially attempting to schedule a job, but I'm wondering if once a job is scheduled, SLURM will stop trying to schedule that job each backfill iteration? (In reply to Will French from comment #34) > I'm re-opening this ticket because we're experiencing some issues that > appear to be related to our backfill parameters. > > Our users have reported seeing messages like the following when attempting > to run sbatch, squeue, or other SLURM commands: > > sbatch: error: slurm_receive_msg: Socket timed out on send/recv operation > sbatch: error: Batch job submission failed: Socket timed out on send/recv > operation > > We gather that this is related to the SLURM controller being busy due to > computationally demanding tasks within the backfill algorithm. We don't > consider this to be a big problem for users submitting from the command > line, but many of our users have automated pipelines that are failing > because of this error. At the moment we are using: > > SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=300, > bf_max_job_test=5000,bf_max_job_user=300 > > Our general understanding is that we can improve responsiveness by > increasing bf_interval and decreasing the other parameters. Is > responsiveness most sensitive to one of these parameters more so than the > others? There are a lot of variables involved. Could you send a current output of the "sdiag" command so we can best advise how to proceed. > Would you recommend we try the "defer" option? Probably not unless you are submitting 100+ jobs/second. > Relatedly, could you explain > what the process is for a new job that is submitted to the scheduler? I had > assumed that SLURM only attempted to schedule jobs every bf_interval (60 in > our case) seconds. So if the controller attempted to schedule jobs at 2 AM > (2:00:00), and a job is submitted at 2:00:01, SLURM would not attempt to > schedule that job until 2:01:00. The "defer" option leads me to believe that > my understanding of how the backfill algorithm works is wrong. I'll reference the tutorial here: http://slurm.schedmd.com/SUG14/sched_tutorial.pdf The scheduler will run immediately at job submit time (see pages 8 and 9). If there are resources are available and the scheduler goes far enough down the list of jobs to reach it and it is the highest priority pending job in its queue, it will start immediately. Once each minute, all of the jobs get checked (see page 10), but only he highest priority jobs in each queue can be started. Once each "bf_interval" (time starts after the previous backfill scheduler completes), the backfill scheduler will go however deep in the queues to determine when and where each pending job will start (see pages 14 - 25). Depending upon your configuration and workload, the backfill scheduler might take several minutes to complete a cycle. It might be helpful for you to contact Jacob Jenson (jacob@schedmd.com) to be notified when we have our next training. We cover this sort of thing in great detail in a 2-day training session. > One last thing, bf_window should be in minutes, correct? That's what I see > in old documentation > (https://computing.llnl.gov/linux/slurm/slurm.conf.html) but docs on the > SchedMD site does not specify units > (http://slurm.schedmd.com/sched_config.html). That will be fixed soon. > Could increasing bf_window > improve responsiveness? That may make the sluggish responses less frequent rather than eliminating them. > I understand that it will increase the computational > load when initially attempting to schedule a job, but I'm wondering if once > a job is scheduled, SLURM will stop trying to schedule that job each > backfill iteration? Every pending job (per your configuration) gets checked on each backfill cycle. Hi Moe, Thanks for the quick reply and detailed explanation. It's very helpful. About the training -- is that on-site or is there an option to remote in? I'm attaching the output from sdiag. Note we made a few changes to our backfill parameters since yesterday: SchedulerParameters = bf_interval=120,bf_continue,bf_resolution=300,bf_max_job_test=1000,bf_max_job_user=50 One other question -- how does the backfill algorithm handle job arrays? Really what I want to know is if job arrays are scheduled more efficiently (improves responsiveness) than the equivalent of submitting n jobs by invoking sbatch n times. We have a lot of users who do the latter. Best, Will Created attachment 1612 [details]
sdiag output
(In reply to Will French from comment #36) > Hi Moe, > > Thanks for the quick reply and detailed explanation. It's very helpful. > About the training -- is that on-site or is there an option to remote in? Both are options. > I'm attaching the output from sdiag. > > Note we made a few changes to our backfill parameters since yesterday: > > SchedulerParameters = > bf_interval=120,bf_continue,bf_resolution=300,bf_max_job_test=1000, > bf_max_job_user=50 > > One other question -- how does the backfill algorithm handle job arrays? > Really what I want to know is if job arrays are scheduled more efficiently > (improves responsiveness) than the equivalent of submitting n jobs by > invoking sbatch n times. We have a lot of users who do the latter. Job arrays are MUCH more efficient with respect to scheduling and general system overhead. For most logic, the entire array is treated as a single job record. > Best, > > Will (In reply to Will French from comment #37) > Created attachment 1612 [details] > sdiag output I'm going to suggest an addition to your SchedulerParameters option that should greatly reduce worst-case delays, but probably not make much difference to average delays: max_sched_time=2 I see your current worst case consuming over 5 seconds (half of MessageTimeout): > Main schedule statistics (microseconds): > Max cycle: 5363032 > Mean cycle: 259843 As I understand, that would give you SchedulerParameters=bf_interval=120,bf_continue,bf_resolution=300,bf_max_job_test=1000,bf_max_job_user=50,max_sched_time=2 There is also one other thing in the sdiag output that I'll need to have someone investigate: ACCOUNTING_UPDATE_MSG (10001) count:9 ave_time:8451672 total_time:76065050 One more thing about your configuration: SlurmctldDebug = debug3 SlurmdDebug = debug2 These are so detailed, that it will definitely adversely impact performance, especially when it comes to accounting and scheduling logic. You probably want to normally run with them both set to "info" or "verbose". We actually have: SlurmctldDebug=3 SlurmdDebug=3 which appears to correspond to "info": root@vmps11:~# scontrol show config | grep -i debug DebugFlags = (null) SlurmctldDebug = info SlurmdDebug = info You may have been looking at the older version of the slurm.conf attached to this thread. I went ahead and added max_sched_time=2 per your suggestion. Thanks for looking into this. |
Created attachment 1563 [details] squeue output Hello, As we are moving more and more of our users over to our SLURM-managed cluster, we are finally pushing SLURM hard enough to get a better sense of how it handles job scheduling. So far, one thing that is unclear to us is the "Priority" reason listed for some pending jobs. Attached is the result of running squeue on our cluster this morning. As you will see, there are a number of jobs (242) that are pending due to "Priority". However, if you look at a summary of the nodes, there are a large number that are completely idle (forgive the alias): [frenchwr@vmps11 ~]$ sinfofeatures NODELIST FEATURES AVAIL NODES(A/I) vmp[101-103,105-110,112-120] amd up 18/0 vmp[301-380] intel,sandy_bridge up 80/0 vmp[502-548,552-574,602-648,652-653,659-662,664-690,1041-1054,1081-109 intel up 92/87 vmp[801-803,805-809,813,815,817-819,821-830,835,838,840] cuda42 up 8/18 vmp[831-834] As you can see, there are 105 nodes that are completely idle. We understand that it might be possible for jobs to be blocked due to Priority if they are asking for an extremely long wall time and other higher priority jobs are waiting for resources to become available, but even if I submit an extremely small test script requesting 15 minutes of walltime, the job will be pending with "Priority" listed as the reason...the job will eventually run but only after sitting in the queue for 20-25 minutes (all while those 105 nodes sit idle). Here's the SLURM batch script for this test: -------------------------------- #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=400mb #SBATCH --time=00:15:00 #SBATCH --output=testjob5.output echo "Hello world!" -------------------------------- Note that we do have backfill configured. Can you explain why this happens and what we can do to expedite scheduling of jobs? Here is our configuration: [frenchwr@vmps11 ~]$ scontrol show config Configuration data as of 2015-01-21T08:56:56 AccountingStorageBackupHost = (null) AccountingStorageEnforce = associations,limits,safe AccountingStorageHost = slurmdb AccountingStorageLoc = N/A AccountingStoragePort = 6819 AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreJobComment = YES AcctGatherEnergyType = acct_gather_energy/none AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType = acct_gather_infiniband/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none AllowSpecResourcesUsage = 0 AuthInfo = (null) AuthType = auth/munge BackupAddr = 10.0.0.50 BackupController = slurmsched2 BatchStartTimeout = 10 sec BOOT_TIME = 2015-01-20T14:57:32 CacheGroups = 0 CheckpointType = checkpoint/none ChosLoc = (null) ClusterName = accre CompleteWait = 0 sec ControlAddr = 10.0.0.49 ControlMachine = slurmsched1 CoreSpecPlugin = core_spec/none CpuFreqDef = OnDemand CryptoType = crypto/munge DebugFlags = NO_CONF_HASH DefMemPerNode = UNLIMITED DisableRootJobs = NO DynAllocPort = 0 EnforcePartLimits = NO Epilog = (null) EpilogMsgTime = 2000 usec EpilogSlurmctld = (null) ExtSensorsType = ext_sensors/none ExtSensorsFreq = 0 sec FairShareDampeningFactor = 1 FastSchedule = 1 FirstJobId = 1 GetEnvTimeout = 2 sec GresTypes = (null) GroupUpdateForce = 0 GroupUpdateTime = 600 sec HASH_VAL = Match HealthCheckInterval = 0 sec HealthCheckNodeState = ANY HealthCheckProgram = (null) InactiveLimit = 0 sec JobAcctGatherFrequency = 30 JobAcctGatherType = jobacct_gather/linux JobAcctGatherParams = (null) JobCheckpointDir = /var/slurm/checkpoint JobCompHost = slurmdb JobCompLoc = slurm_jobcomp_db JobCompPort = 0 JobCompType = jobcomp/mysql JobCompUser = slurm JobContainerType = job_container/none JobCredentialPrivateKey = (null) JobCredentialPublicCertificate = (null) JobFileAppend = 0 JobRequeue = 1 JobSubmitPlugins = (null) KeepAliveTime = SYSTEM_DEFAULT KillOnBadExit = 0 KillWait = 30 sec LaunchType = launch/slurm Layouts = Licenses = (null) LicensesUsed = (null) MailProg = /bin/mail MaxArraySize = 1001 MaxJobCount = 10000 MaxJobId = 4294901760 MaxMemPerNode = UNLIMITED MaxStepCount = 40000 MaxTasksPerNode = 128 MemLimitEnforce = yes MessageTimeout = 10 sec MinJobAge = 300 sec MpiDefault = none MpiParams = (null) NEXT_JOB_ID = 17482 OverTimeLimit = 0 min PluginDir = /usr/scheduler/slurm/lib/slurm PlugStackConfig = /usr/scheduler/slurm-14.11.3/etc/plugstack.conf PreemptMode = OFF PreemptType = preempt/none PriorityParameters = (null) PriorityDecayHalfLife = 7-00:00:00 PriorityCalcPeriod = 00:05:00 PriorityFavorSmall = 0 PriorityFlags = PriorityMaxAge = 14-00:00:00 PriorityUsageResetPeriod = NONE PriorityType = priority/multifactor PriorityWeightAge = 1000 PriorityWeightFairShare = 1000 PriorityWeightJobSize = 1000 PriorityWeightPartition = 1000 PriorityWeightQOS = 1000 PrivateData = none ProctrackType = proctrack/cgroup Prolog = (null) PrologSlurmctld = (null) PrologFlags = (null) PropagatePrioProcess = 0 PropagateResourceLimits = ALL PropagateResourceLimitsExcept = (null) RebootProgram = (null) ReconfigFlags = (null) RequeueExit = (null) RequeueExitHold = (null) ResumeProgram = (null) ResumeRate = 300 nodes/min ResumeTimeout = 60 sec ResvEpilog = (null) ResvOverRun = 0 min ResvProlog = (null) ReturnToService = 1 RoutePlugin = (null) SallocDefaultCommand = (null) SchedulerParameters = (null) SchedulerPort = 7321 SchedulerRootFilter = 1 SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill SelectType = select/cons_res SelectTypeParameters = CR_CORE_MEMORY SlurmUser = slurm(59229) SlurmctldDebug = debug3 SlurmctldLogFile = (null) SlurmctldPort = 6817 SlurmctldTimeout = 120 sec SlurmdDebug = debug2 SlurmdLogFile = (null) SlurmdPidFile = /var/run/slurm/slurmd.pid SlurmdPlugstack = (null) SlurmdPort = 6818 SlurmdSpoolDir = /usr/spool/slurm SlurmdTimeout = 300 sec SlurmdUser = root(0) SlurmSchedLogFile = (null) SlurmSchedLogLevel = 0 SlurmctldPidFile = /var/run/slurm/slurmctld.pid SlurmctldPlugstack = (null) SLURM_CONF = /usr/scheduler/slurm-14.11.3/etc/slurm.conf SLURM_VERSION = 14.11.3 SrunEpilog = (null) SrunProlog = (null) StateSaveLocation = /usr/scheduler/state SuspendExcNodes = (null) SuspendExcParts = (null) SuspendProgram = (null) SuspendRate = 60 nodes/min SuspendTime = NONE SuspendTimeout = 30 sec SwitchType = switch/none TaskEpilog = (null) TaskPlugin = task/cgroup TaskPluginParam = (null type) TaskProlog = (null) TmpFS = /tmp TopologyPlugin = topology/none TrackWCKey = 0 TreeWidth = 50 UsePam = 0 UnkillableStepProgram = (null) UnkillableStepTimeout = 60 sec VSizeFactor = 0 percent WaitTime = 0 sec Slurmctld(primary/backup) at slurmsched1/slurmsched2 are UP/UP