Created attachment 1653 [details] scontrol show config I just updated to slurm 14.11.4 last Friday. I'm seeing some strangeness with priority with several users' jobs. sprio shows something like this: # sprio -l -j 11651503 JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11651503 stevetse 1 0 0 0 1 0 0 If I submit a new job the priority in sprio looks ok, but after a few minutes the sprio output for the pending job looks the same as the output above. # date; sprio -l -j 11671959 Thu Feb 19 11:55:18 CST 2015 JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11671959 wettstei 180401 0 80171 231 100000 0 0 # date; sprio -l -j 11671959 Thu Feb 19 11:58:44 CST 2015 JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11671959 wettstei 1 0 0 0 1 0 0 Here is the scontrol show jobid. This is after sprio shows 1 for the priority. The priority is the same as the initial priority. It never updates. I don't know if that is correct behavior or not: # scontrol show jobid=11671959 JobId=11671959 JobName=_interactive UserId=wettstein(891783663) GroupId=wettstein(891783663) Priority=180401 Nice=0 Account=rcc-staff QOS=sandyb JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=02:00:00 TimeMin=N/A SubmitTime=2015-02-19T11:54:26 EligibleTime=2015-02-19T11:54:26 StartTime=2015-02-19T14:06:50 EndTime=Unknown PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=sandyb AllocNode:Sid=midway-login1:1470 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=midway[036,069-071,083,086-087,091,098,109,195,197-199,220,222,225-226,260,306,321,323,326,329-330,334-336,351,369,380,382] NumNodes=32-32 NumCPUs=32 CPUs/Task=1 ReqB:S:C:T=0:0:*:* Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=0 Contiguous=0 Licenses=(null) Network=(null) Command=/software/bin/_interactive WorkDir=/home/wettstein/redmine/pubsw/software/modulefiles/comsol StdErr=/dev/null StdIn=/dev/null StdOut=/dev/null I've attached the output of scontrol show config.
Are you using PriorityFlags=FAIR_TREE? scontrol doesn't display FAIR_TREE if it is set -- fixed in 14.11.5. This looks similar to Bug 1454.
Will you attach your slurm.conf? Does it happen for all jobs?
Created attachment 1655 [details] slurm.conf
(In reply to Brian Christiansen from comment #2) > Will you attach your slurm.conf? > > Does it happen for all jobs? All jobs that we had pending that were older than a few minutes had priority=1 in the sprio output. I've attached my slurm.conf. I turned off FAIR_TREE for now.
Thanks. Let us know if turning off FAIR_TREE fixes the sprio output.
Yes, I am using FAIR_TREE and it does look similar to that bug. I turned off FAIR_TREE and priority for new jobs stayed normal. The existing jobs all looked like this in sprio: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11651503 stevetse 10000 0 0 0 0 0 -10000 I did an scontrol release on all of those jobs and the priorities became normal and the scontrol show jobid does show the updated priority correctly, now. On Thu, Feb 19, 2015 at 06:30:03PM +0000, bugs@schedmd.com wrote: > http://bugs.schedmd.com/show_bug.cgi?id=1469 > > Brian Christiansen <brian@schedmd.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|UNCONFIRMED |IN_PROGRESS > Ever confirmed|0 |1 > > --- Comment #1 from Brian Christiansen <brian@schedmd.com> --- > Are you using PriorityFlags=FAIR_TREE? scontrol doesn't display FAIR_TREE if it > is set -- fixed in 14.11.5. This looks similar to Bug 1454. > > -- > You are receiving this mail because: > You reported the bug.
In Bug 1454, the problem went away after restarting. Could you try turning fair_tree back on and seeing if the problem occurs again? If it does, will you do the following: 1. Get controller logs of at least "info" with the "Priority" debug flag set 2. outputs of: a. sshare -l b. sprio -l c. scontrol show jobs d. sacctmgr show assoc tree Thanks, Brian
Yes, it does seem to be intermittent. I restarted with FAIR_TREE on and priorities were normal. I enabled and disabled FAIR_TREE a few times a few times and managed to make the priorities weird with both FAIR_TREE enabled and disabled. All pending jobs looked like this with FAIR_TREE disabled: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11591807 dingpan 10000 0 0 0 0 0 -10000 I did capture the output from the commands and had Priority debug level set. I'll attach that output. When I did the FAIR_TREE switch, I also switched from fairshare=parent to fairshare=1 on users. I switched that back and forth a few times this morning as well, but I didn't trigger anything that I could tell. I've currently got FAIR_TREE turned on and it is setting jobs priority=1 after a few minutes in the queue. I can leave it this way for a little bit if you want me to collect additional information while it is happening.
Created attachment 1657 [details] fair_tree disabled and weird priority sshare -l sprio -l scontrol show jobs sacctmgr show assoc tree
Created attachment 1658 [details] fair_tree disabled slurm log
Created attachment 1659 [details] fair_tree enabled and weird priority sshare -l sprio -l scontrol show jobs sacctmgr show assoc tree
Created attachment 1660 [details] fair_tree enabled and weird priority after a few minutes sprio -l scontrol show job Jobs 11677655 11677656 priority goes to 1.
Created attachment 1661 [details] fair_tree enabled slurm log
Awesome, thanks! I'll mull over all of it. You can disable fair_tree now if you want.
We just started seeing this today with 14.11.4. A quick check with +Priority looks like the following part of the code is not hit for some jobs (this was one of the few with non-zero info in sprio): [2015-02-23T13:54:35.319] Job 5659119 priority: 0.00 + 1298.48 + 0.00 + 0.00 + 0.00 - 0 = 1298.48 [2015-02-23T13:54:35.319] Fairhare priority of job 5659124 for user klbell9 in acct jsk24 is 0.129848 [2015-02-23T13:54:35.319] Weighted Age priority is 0.000000 * 0 = 0.00 [2015-02-23T13:54:35.319] Weighted Fairshare priority is 0.129848 * 10000 = 1298.48 [2015-02-23T13:54:35.319] Weighted JobSize priority is 0.000000 * 0 = 0.00 [2015-02-23T13:54:35.319] Weighted Partition priority is 0.000000 * 0 = 0.00 [2015-02-23T13:54:35.319] Weighted QOS priority is 0.000000 * 1000000 = 0.00 It looks like it is skipped for jobs with zeroes in sprio. I think that the jobs are getting an initial priority set then never hitting _get_priority_internal() again. Why that would be, I have no idea. It also seems that the priority value stays at what it was before the bug is triggered, but I'm not completely certain of that yet. I'm adding some extra info statements to see where the problem is but a restart of slurmctld seems to temporarily fix the problem, so progress may be slow. My question is how the jobs 1) got an initial priority, 2) still have a priority at all times, 3) have individual factors reset to 0 at some point, 4) never seem to hit _get_priority_internal() again, and 5) successfully get scheduled at some point.
One other piece of information is that scontrol show job and squeue showed incorrect priority values. They were likely incorrect because the values weren't being updated anymore. sshare, on the other hand, showed proper values for Level FS and Fairshare. The distinction there is associations versus jobs. It seems like the jobs aren't making it to the job list in the call to decay_apply_weighted_factors at https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/fair_tree.c#L70. That's a pretty straight shot from https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/priority_multifactor.c#L1322. That job_list is a global variable. It's highly likely that something is corrupting job_list such that it only affects the multifactor plugin, if I'm reading the code correctly.
Odd... I have a script checking once a minute to see if sprio has zeroes. It triggered once but the zeroes are gone now. It happened to be at the same second that decay_apply_weighted_factors was being called. Maybe there's a locking issue.
Now I have the script checking sprio continuously with only one second sleep in between. It's showing zeroes much more often now and it's always the same second when the log shows decay_apply_weighted_factors is being called on jobs. It should be locked at that point by https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/fair_tree.c#L69, unless that isn't the right lock to use. Locking in Slurm is not my forte but it certainly seems like there's some kind of race condition due to locks not being used correctly.
One more thing to add before I leave for the day is that the values do exist again in sprio except while decay_apply_weighted_factors is being called. This is different than the issue of the values disappearing until a slurmctld restart but it seems related.
Thanks Ryan for looking into it as well. I'm still looking as well.
Based on the logging I added, it looks like the priority_fs value can get set to zero after some job array tasks get started but others are still pending.
Created attachment 1673 [details] gdb watch points I can get this one now when running array jobs: brian@compy:~/slurm/14.11/compy$ sprio -l -j 12063 JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 12063 brian 10000 0 0 0 0 0 -10000 I've attached my gdb output of setting watch points on job->priority and job->prio_factors->priority_part. It appears the prio_factors are being recalculated inside the locks in fair_tree.
> I've attached my gdb output of setting watch points on job->priority and > job->prio_factors->priority_part. It appears the prio_factors are being > recalculated inside the locks in fair_tree. That's intentional and similar to how the other algorithms do this, AFAICT. Based on the wording of your comment I can't tell if you are saying "good that it's in locks", "it shouldn't be doing that there", or "just an observation". set_priority_factors() may be the source of the zeroes (https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/priority_multifactor.c#L1901) but it is inside of a lock. It zeroes the factors then it immediately sets the values. However, it is inside of locks so nothing should see the zeroes, right? Could there be something else in the code that should have a read lock but doesn't? Of course this may have nothing to do with locks but that's my only guess at the moment besides job array entries losing their priority factors when a new array task starts running.
Just an observation. Because the factors are being set within the locks, I wouldn't expect to see the zeros. If only I could reproduce it...
This is what I have been using to hit the one race condition fairly often: while true; do out=$(time sprio -l | awk '$5 == 0 {print}'); if [[ "x$out" != "x" ]]; then echo -e "$(date)\n\n$out" | mail -s zeroes "$email"; echo Found at $(date); fi; sleep 1; done If you submit a job array where some tasks start and others don't, you'll very likely see some weirdness almost immediately. I still haven't encountered the initial condition I saw with all zeroes for a prolonged period.
ok. I have a pretty good reproducer for the zeros issue. I was hoping to catch the issue where all the priorities are all 1's as well. I'll dive into this reproducer. Hopefully the two scenarios are related. brian@compy:/tmp$ sprio -l JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 18789 brian 10000 0 0 0 0 0 -10000
This commit fixes the job array issue. https://github.com/SchedMD/slurm/commit/423029d8364e8856c6c32b019cb177e92ea18665 Ryan, have you seen the priorities all be 1 like Chicago has seen? I'm thinking that there is something lurking still because a lot of the Chicago jobs were not job arrays.
(In reply to Brian Christiansen from comment #27) > This commit fixes the job array issue. > https://github.com/SchedMD/slurm/commit/ > 423029d8364e8856c6c32b019cb177e92ea18665 > I had actually looked at that function earlier and figured that the source of the job array issues might be in there somewhere :). The patch looks like it will fix that problem and we applied it. So far the issue seems to have gone away with job arrays but we do still get zeroes occasionally when sprio and calls to decay_apply_weighted_factors occur at the same time (more below). > Ryan, have you seen the priorities all be 1 like Chicago has seen? I'm > thinking that there is something lurking still because a lot of the Chicago > jobs were not job arrays. I don't recall if I've ever seen the lines had a one for priority. I just recall seeing this ticket and thinking we were seeing the exact same thing. Unfortunately I accidentally lost the sprio output after restarting slurmctld (oops). I do remember that it had zero for fairshare and I think it had a zero for Partition, unlike theirs. Our partition weight is set to zero, so maybe that's why. I really don't remember what the priority was set to. The job's priorities according to squeue and scontrol job were definitely not being updated even though sshare showed correct values. As far as the all zeroes issue, I see lines like this with sprio -l every once in a while: 5759600 lundgrer 0 0 0 0 0 0 0 5759601 lundgrer 0 0 0 0 0 0 0 5765644 jtfuller 0 0 0 0 0 0 0 5765649 jtfuller 0 0 0 0 0 0 0 5765663 jtfuller 0 0 0 0 0 0 0 5766071 jtfuller 0 0 0 0 0 0 0 5766901 jtfuller 0 0 0 0 0 0 0 5766903 njensen6 0 0 0 0 0 0 0 5766928 njensen6 0 0 0 0 0 0 0 It's easier to trigger if I make the calculations in the multifactor plugin take longer, such as by outputting hundred of lines of debug statements per second to the logs. I added some info statements that I activate by adding the priority debug flag (that's how I narrowed down the one issue to job arrays). I bet if you put a sleep statement in decay_apply_weighted_factors or its calling function you could trigger this issue very easily. You can then do a while loop with sprio like the one I mentioned in comment 25, maybe without the sleep. I guess I haven't done a while loop with squeue and scontrol show job to look for zero priority values there, so that might be something else for you to look at if you can reproduce the issues I'm seeing. Maybe at some point a race condition causes that zero priority to get copied into the job's priority field. After that, it would skip almost all calculations since it's "held" and it wouldn't change from then on, except maybe after a slurmctld restart. However, I don't think explains why sprio shows a 1 in Priority for Chicago, so maybe that's not a good theory.
One other interesting thing I noticed is that my extra debug statements in the logs show priority_p_get_priority_factors_list() being called in the middle of the Fair Tree debug output that is part of _apply_priority_fs(). That, however, does not consider jobs at all so it should be safe to happen simultaneously. _apply_priority_fs() is called here after locking associations: https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/fair_tree.c#L65. priority_p_get_priority_factors_list() obtains a job read lock here: https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/priority_multifactor.c#L1784. priority_p_get_priority_factors_list() also does some stuff with associations without a lock (that I can see) but I don't know if that's a problem. It probably obtains locks in the function it calls but I'm taking off now and don't have time to check: https://github.com/SchedMD/slurm/blob/slurm-14.11/src/plugins/priority/multifactor/priority_multifactor.c#L1815 I do not think that this is an issue but I figured I should mention it since I could be wrong. Since I'm calling sprio about once a second while testing, it's inevitable that the calls would happen simultaneously sometimes if they don't depend on locks on the same data structures.
I've haven't been able to reproduce the all zeros case, even with a sleep and multiple while loops running sprio -l every .1 seconds.
Having David put a second pair of eyes on this.
FYI we still get zeroes pretty reliably as described before.
Without fairtree scheduling turned on, I have noticed that the Age priority seems to stop increasing at some point in time as well. One user had several jobs that had the age priority factor stuck at 766. I did an scontrol release on all of those jobs and the age priority immediately went to 1798. This seems to affect all jobs when that happens. There also continues to be a smaller number of jobs that end up with a priority that looks like this: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11872918 simonfre 10000 0 0 0 0 0 -10000 These also get corrected with scontrol release.
Thanks for the information. Did you apply the following patch: https://github.com/SchedMD/slurm/commit/423029d8364e8856c6c32b019cb177e92ea18665 This should resolve the priorities like: JOBID USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS NICE 11872918 simonfre 10000 0 0 0 0 0 -10000 It was an issue with array jobs.
What version OS/distro are you using? In Bug 1550, I was able to reproduce the priorities not being calculated -- which caused the 10000/0 priority outputs. Will you try compiling without optimizations?
I'll try to give that a shot today (and look at some of the other bug reports I've opened too).
(In reply to Brian Christiansen from comment #35) > What version OS/distro are you using? In Bug 1550, I was able to reproduce > the priorities not being calculated -- which caused the 10000/0 priority > outputs. > > Will you try compiling without optimizations? I'm compiling on RHEL 6.6. I'm rebuilding now with -O0.
I forgot to mention that the ctld and dbd run on Debian 6 (to be upgraded in the somewhat near future with gcc --version: gcc (Debian 4.4.5-8) 4.4.5
I just had the zeroes happen again while running with -O0. It seems to happen less often now though.
It turns out I didn't recompile all the code with -O0. Trying again now.
I still get zeroes. Unless I'm missing something, the following should work, right? make clean env CFLAGS=-O0 ./configure --sysconfdir=/etc/slurm make install
Ya, that'll do it. I was running with fair_tree today and running your email command and I got one email where everything was: 3469 brian 2 0 0 1 1 0 0 I'll have to see if I can catch it again. I'm curious to see what Chicago sees.
Created attachment 1782 [details] patch Ryan, will you apply this patch. I believe this is the root cause of the issues. list_for_each is checking for a return value from void functions. Sometimes that value may be less than 0. When the return value is <0 it stops processing the rest of the jobs in the list. I was able to see this happen on my side. Let me know how it works for you.
Wow. That's a subtle bug. I applied the patch and am still on -O0. I'll let you know how it goes.
Looking through that some more, I can see where Fair Tree makes mistakes with list_for_each. Those functions should have returned an int, even if it's worthless, rather than be a void function. We missed that the ListForF expected an int function rather than a void. Essentially that results in the void return type returning whatever was on the stack last. I wonder if there's anywhere else that uses list_for_each and makes that assumption. I can submit a patch next week myself, but I imagine you'll want to solve it sooner by redefining those few functions used by Fair Tree to be int rather than void and just return SLURM_SUCCESS or 0 or whatever. All that said, removing the break statement did not solve our zeroes problem but I can see how it would solve other problems. Our issue might be different than Andy's after all.
You totally ruined my weekend. :) I made the changes in: https://github.com/SchedMD/slurm/commit/b2be6159517b197188b474a9477698f6f5edf480 I'll see if I can see zeros again with the commit. Andy, will you let me know if this patches fixes it for you as well?
It was a great find nonetheless :) One other idea I had to track it down is to have the code core dump if it ever returns one of those values as a zero to the RPC call for sprio that should never be zero, such as fairshare. That should give a pretty good view into a potential race condition to see what threads were where at the time. I've been meaning to try replicating this in a test instance and then I had the crazy thought to try it in production then rely on a failover to keep things alive. I haven't had the time to try the first idea and I'm not feeling too crazy about the second idea at the moment, but maybe I'll try it next week if I still haven't updated our test environment and I am feeling crazy.
Created attachment 1784 [details] fair_tree fix Ryan, I was able to reproduce it and find the problem -- hopefully for the last time :). The problem happens in fair_tree_decay(). First the _ft_decay_apply_new_usage is called for each job. Inside the function it calls set_priority_factors() which just sets the priority factors without applying the weights -- so you get small numbers. ex. (gdb) p *job->prio_factors $7 = {job_id = 0, user_id = 0, priority_age = 0.00019345238095238096, priority_fs = 0.5, priority_js = 1, priority_part = 1, priority_qos = 0, nice = 10000} The job locks are released and any pending sprio's can then read the job's priority factors -- showing the zeros because the weights haven't been applied. Then decay_apply_weighted_factors() is called for each job which calls _get_priority_internal() which calls set_priority_factors() and applies the weights. It doesn't appear that fair_tree uses the priority factors, so the attached patch removes calling set_priority_factors() from _ft_decay_apply_new_usage(). Will you apply this patch and let me know how it works for you?
Unfortunately I'm having a hard time going back and remembering all the calls to preexisting parts of the multifactor plugin. It really would have been nice to have the decay part separated out into another plugin as well as individual plugins for each of the factors in multifactor plugins... :) I have to admit that I'm a little confused about how this patch would work. set_priority_factors works per job and calls xmalloc or memset at the start of it to wipe out the priority factors. There are no return calls in the function, so the function runs to completion, all inside of a write lock. set_priority_factors is also per job. There would have to be an epic race condition for sprio to read the job data in the same order at the same rate in between memset and the priorities being set (assuming that locking is broken in the first place). I just can't figure how sprio could see all zeroes due to this. If the locking isn't right, sure, an entry or two may be zeroed out during sprio (or other calls). But all of them (hundreds or thousands of jobs) being zeroed out at the exact same moment? The call to set_priority_factors in _ft_decay_apply_new_usage may indeed be in error, but I can't see how it would result in zeroes for all jobs. The locking around the call to _ft_decay_apply_new_usage (via list_for_each in fair_tree_decay) looks correct. By the way, if this does end up being the correct solution you'll probably want to add a comment saying that you can't directly call decay_apply_new_usage in list_for_each because it has the wrong return type. I'm still looking through the implications of removing the call to set_priority_factors there. I know that we based that call on what other algorithms did. It's quite possible that we got it wrong so I'm trying to check on it. Since there are several entry points into set_priority_factors I'm validating that this one is redundant in some way. I seem to recall that it was there due to applying decay somehow.
Well, I just discovered something right after submitting. My while loop has a problem! It only outputs the rows with zeroes. It should output ALL rows. This is what I should have done: while true; do time sprio_out=$(sprio -l); zeroes=$(echo "$sprio_out" | awk '$5 == 0 {print}'); if [[ "x$zeroes" != "x" ]]; then echo -e "$(date)\n\n$sprio_out" | mail -s zeroes "$email"; echo Found at $(date); fi; sleep 1; done With that fix in place, I now see that I get all zeroes for everyone *except* one user. He has a "1" in the fairshare column but zeroes in all others (except jobid) but should have a fairshare value of 6250 or so. It has happened several times now and it's always user masaki. I can see nothing odd about his jobs so far but I'm still checking. I'm trying to figure out where the 1 in the fairshare column could come from. This seems to be user-based which would imply something to do with assocs rather than jobs. Maybe the part of the code assigning values to jobs should have an assoc mgr lock but doesn't? I'm taking a look.
Let me try to explain it better. I'll show you it to you with gdb. If you look at the job before it enters the first set of locks in fair_tree_decay() you'll see that the job's priority factors are correct from the last calculation: Breakpoint 1, fair_tree_decay (jobs=0x1871d90, start=1427835011) at ../../../../../../slurm/src/plugins/priority/multifactor/fair_tree.c:54 54 { NO_LOCK, WRITE_LOCK, READ_LOCK, READ_LOCK }; (gdb) set $job = (struct job_record *)jobs->head->next->data (gdb) p $job->job_id $1 = 11018 (gdb) p $job->priority $2 = 1225015 (gdb) p *$job->prio_factors $4 = {job_id = 0, user_id = 0, priority_age = 15.294312169312169, priority_fs = 25000, priority_js = 200000, priority_part = 1000000, priority_qos = 0, nice = 10000} (gdb) c Continuing. The jobs then enter _ft_decay_apply_new_usage which will call set_priority_factors() on all of the pending jobs. set_priority_factors only sets the factors and doesn't apply the weights to the factors. So the numbers are really small. ex. Breakpoint 2, fair_tree_decay (jobs=0x1871d90, start=1427835011) at ../../../../../../slurm/src/plugins/priority/multifactor/fair_tree.c:64 64 assoc_mgr_lock(&locks); (gdb) p *$job->prio_factors $5 = {job_id = 0, user_id = 0, priority_age = 0.00040509259259259258, priority_fs = 0.5, priority_js = 1, priority_part = 1, priority_qos = 0, nice = 10000} (gdb) c Continuing. The job write lock is released and at this point any sprio's waiting with a read lock can now read the current priority factors. For this job, age and fairshare will show zero because they will be cast as int's and sprio will show a priority of 2 for the job (job size + partition). The job then enters decay_apply_weighted_factors and _get_internal_priority() is called. Which set the factors and the applies the weights to the factors. So the factors now look like you would expect. Breakpoint 3, fair_tree_decay (jobs=0x1871d90, start=1427835011) at ../../../../../../slurm/src/plugins/priority/multifactor/fair_tree.c:71 71 unlock_slurmctld(job_write_lock); (gdb) p *$job->prio_factors $6 = {job_id = 0, user_id = 0, priority_age = 20.25462962962963, priority_fs = 25000, priority_js = 200000, priority_part = 1000000, priority_qos = 0, nice = 10000} You could also not release the write lock until the end of fair_tree_decay() and it would fix it too but set_priority_factors() doesn't need to be called in _ft_decay_apply_new_usage() so it can just be removed. You can put a sleep after the first release of the job write lock (line 62) and you'll see it happen more. Do you see how it happens now?
(In reply to Brian Christiansen from comment #51) > Do you see how it happens now? Ah. That makes sense now. I know you said in a previous comment that the weights weren't applied yet but somehow I forgot that when I read through the code. We're trying out the patch now.
The testing is very promising and conceptually seems like the right solution. No zeroes yet. It looks like you fixed several very subtle issues with this one ticket: a job array issue, mistakenly casting void pointers as int pointers, and now this one (my bad on the last two). Thanks! I'm not expecting to see any more problems at this point but I'll let you know tomorrow how the overnight testing went.
It worked! I've had a loop running sprio every second since yesterday at 4pm. There have been no instances of zeroes at all since then when I normally would have seen dozens of instances of it.
Awesome! I've committed the patch: https://github.com/SchedMD/slurm/commit/f61475e8005bb9d834f8e227b0dc4282cc1f0aee I'll close the ticket. Please reopen if you see anything else. Thanks for everyone's help!