Ticket 10745 - slurmctld.log _remove_accrue_time_internal
Summary: slurmctld.log _remove_accrue_time_internal
Status: RESOLVED DUPLICATE of ticket 7375
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 20.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-01-29 14:38 MST by Andy Evans
Modified: 2021-01-29 17:21 MST (History)
1 user (show)

See Also:
Site: U of Vermont
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Andy Evans 2021-01-29 14:38:35 MST
Hi,
These messages began repeating at very regular intervals creating some hefty log files:
[2021-01-29T16:17:05.288] error: _remove_accrue_time_internal: QOS normal accrue_cnt underflow
[2021-01-29T16:17:05.289] error: _remove_accrue_time_internal: QOS normal acct pi-sfrietze accrue_cnt underflow
[2021-01-29T16:17:05.289] error: _remove_accrue_time_internal: QOS normal user 340507 accrue_cnt underflow
[2021-01-29T16:17:05.289] error: _remove_accrue_time_internal: assoc_id 688(pi-sfrietze/arrichma/(null)) accrue_cnt underflow
[2021-01-29T16:17:05.289] error: _remove_accrue_time_internal: assoc_id 137(pi-sfrietze/(null)/(null)) accrue_cnt underflow
[2021-01-29T16:17:05.289] error: _remove_accrue_time_internal: assoc_id 1(root/(null)/(null)) accrue_cnt underflow

I have tried a few things, searched bugs and mail archives, and google, without any luck finding anything.

At the same time, I am seeing these repeated in slurmdbd.log
[2021-01-29T16:00:06.527] error: We have more time than is possible (30337200+738000+0)(31075200) > 30337200 for cluster vacc(8427) from 2021-01-29T15:00:00 - 2021-01-29T16:00:00 tres 5
[2021-01-29T16:00:06.648] Warning: Note very large processing time from hourly_rollup for vacc: usec=5982917 began=16:00:00.665

Suggestions welcome, and thank you,

Andy
Comment 1 Andy Evans 2021-01-29 14:50:57 MST
While this bug says 
Importance: 	--- 6 - No support contract 
we did renew our support contract in Aug 2020 I have been told
Comment 2 Jacob Jenson 2021-01-29 15:07:03 MST
Andy, 

Please set the Site field to "U of Vermont" on future tickets to have the Severity properly logged. I have updated this ticket and routed it to the support team. 

Thank you,
Jacob
Comment 3 Marshall Garey 2021-01-29 15:25:13 MST
This is a duplicate of bug 7375. That bug has been delayed and I don't have an estimate of when that will be done, but I want to get it done in the next couple months.

*** This ticket has been marked as a duplicate of ticket 7375 ***
Comment 4 Andy Evans 2021-01-29 15:39:36 MST
(In reply to Jacob Jenson from comment #2)
> Andy, 
> 
> Please set the Site field to "U of Vermont" on future tickets to have the
> Severity properly logged. I have updated this ticket and routed it to the
> support team. 
> 
> Thank you,
> Jacob

Sorry, I missed the "U of" section and couldn't find us. Thank you for this.

Andy
Comment 5 Andy Evans 2021-01-29 16:11:14 MST
(In reply to Marshall Garey from comment #3)
> This is a duplicate of bug 7375. That bug has been delayed and I don't have
> an estimate of when that will be done, but I want to get it done in the next
> couple months.
> 
> *** This bug has been marked as a duplicate of bug 7375 ***

The only commonality I could find in the users and jobs generating the error, perhaps, is the use of Dependency=afterok

Some jons include multiple comma delimited afterok/jobid combinations. The jobs are submitted to one just partition and normal QOS.

Just trying to fill in the blanks, thank you for your effort,
Andy
Comment 6 Marshall Garey 2021-01-29 17:21:50 MST
(In reply to Andy Evans from comment #5)
> (In reply to Marshall Garey from comment #3)
> > This is a duplicate of bug 7375. That bug has been delayed and I don't have
> > an estimate of when that will be done, but I want to get it done in the next
> > couple months.
> > 
> > *** This bug has been marked as a duplicate of bug 7375 ***
> 
> The only commonality I could find in the users and jobs generating the
> error, perhaps, is the use of Dependency=afterok
> 
> Some jons include multiple comma delimited afterok/jobid combinations. The
> jobs are submitted to one just partition and normal QOS.
> 
> Just trying to fill in the blanks, thank you for your effort,
> Andy

Thanks Andy, this is interesting and useful information. Can you comment with this information (and whatever else you can find to reproduce your situation) on bug 7375?