Ticket 5166 - More time and more allocated time then possible +
Summary: More time and more allocated time then possible +
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmdbd (show other tickets)
Version: 17.11.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Dominik Bartkiewicz
QA Contact:
URL:
: 5341 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2018-05-11 10:30 MDT by Josko Plazonic
Modified: 2019-10-10 08:50 MDT (History)
1 user (show)

See Also:
Site: Princeton (PICSciE)
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 18.08.0rc1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Josko Plazonic 2018-05-11 10:30:51 MDT
We have a lot of messages like:

[2018-05-11T12:00:00.677] error: We have more allocated time than is possible (6894536546 > 91555200) for cluster tiger2(25432) from 2018-05-11T11:00:00 - 2018-05-11T12:00:00 tres 5
[2018-05-11T12:00:00.677] error: We have more time than is possible (91555200+1936800+0)(93492000) > 91555200 for cluster tiger2(25432) from 2018-05-11T11:00:00 - 2018-05-11T12:00:00 tres 5

and

[2018-05-11T12:00:13.171] error: We have more allocated time than is possible (1031627336 > 23846400) for cluster della(6624) from 2018-05-11T11:00:00 - 2018-05-11T12:00:00 tres 5
[2018-05-11T12:00:13.171] error: We have more time than is possible (23846400+277200+0)(24123600) > 23846400 for cluster della(6624) from 2018-05-11T11:00:00 - 2018-05-11T12:00:00 tres 5

(for 2 clusters). We checked for runawayjobs but nothing there. We also think that this is probably the reason for the following - look at % used.  Root, of course, never used anything so 112.50% is really puzzling. 

sreport cluster AccountUtilizationByUser start=5/10/18 -t percent | head -10
--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2018-05-10T00:00:00 - 2018-05-10T23:59:59 (86400 secs)
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
  Cluster         Account     Login     Proper Name       Used   Energy 
--------- --------------- --------- --------------- ---------- -------- 
   tiger2            root                              200.81%    0.00% 
   tiger2            root      root            root    112.50%    0.00% 
   tiger2           astro                                3.12%    0.00% 
   tiger2           astro  ckreisch Christina D. K+      0.01%    0.00% 

Anything we can check? Do? 

Thanks!
Comment 1 Dominik Bartkiewicz 2018-05-15 11:15:39 MDT
Hi

Did you have any reservation in systems in that day?
If yes could you send me the output from "sacctmgr show reservation start=5/10/18"

Dominik
Comment 2 Josko Plazonic 2018-05-15 11:39:34 MDT
Aha, interesting, no intential reservations (other then test75):

sacctmgr show reservation start=5/10/18
   Cluster            Name                           TRES           TimeStart             TimeEnd UnusedWall 
---------- --------------- ------------------------------ ------------------- ------------------- ---------- 
    tiger2         root_24                       cpu=2240 2017-12-12T07:27:19 2018-12-12T06:00:00 3.4609e+07 
    tiger2         root_39                      cpu=15680 2018-04-10T06:00:00 2019-04-10T06:00:00 3.0672e+06 
    tiger2         root_40                       cpu=2240 2018-04-10T06:00:00 2019-04-10T06:00:00 3.0672e+06 
    tiger2          test75                         cpu=80 2018-05-03T21:32:59 2018-05-10T00:00:00 5.2682e+05 

I tried removing or editing those old reservations but with no luck

scontrol update reservation=root_40 EndTime=2018-04-10T14:00:00
Error updating the reservation: Requested reservation is invalid
slurm_update error: Requested reservation is invalid

and

scontrol delete reservation=root_40
delete_reservation reservation=root_40: Requested reservation is invalid

I assume these reservations are the reason for time problems? Any way to change them other then editing the DB directly?
Comment 3 Josko Plazonic 2018-05-15 11:49:50 MDT
Let me add that these reservations are not active.  We started adding very long maintenance reservations and then deleting them, e.g. for next few maintenances we have this waiting:

scontrol show reservation
ReservationName=root_43 StartTime=2018-06-12T06:00:00 EndTime=2019-06-12T06:00:00 Duration=365-00:00:00
   Nodes=tiger-h19c[1-2]n[1-24],tiger-h20c[1-2]n[1-24],tiger-h21c[1-2]n[1-24],tiger-h22c1n[1-24],tiger-h23c[1-2]n[1-24],tiger-h24c[1-2]n[1-24],tiger-h25c[1-2]n[1-24],tiger-h26c[1-2]n[1-24],tiger-i26c1n[1-24],tiger-i26c2n[1-8] NodeCnt=392 CoreCnt=15680 Features=(null) PartitionName=all Flags=MAINT,IGNORE_JOBS,SPEC_NODES,PART_NODES
   TRES=cpu=15680
   Users=root Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a

ReservationName=root_44 StartTime=2018-06-12T06:00:00 EndTime=2019-06-12T06:00:00 Duration=365-00:00:00
   Nodes=tiger-i[19-23]g[1-16] NodeCnt=80 CoreCnt=2240 Features=(null) PartitionName=gpu Flags=MAINT,IGNORE_JOBS,SPEC_NODES,PART_NODES
   TRES=cpu=2240
   Users=root Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=(null) Watts=n/a

I guess one could argue that deleted reservations should not be counted any more but I guess you could argue - until when.  Probably until they got deleted?  Can we suggest that code should be changed to update endtime of a reservation to the time of deletion?

We also still need a way to update these...

Thanks.
Comment 4 Dominik Bartkiewicz 2018-05-17 06:30:54 MDT
Hi

In a normal case when reservation is deleted its ending time should be updated. I can think about at least one situation when this can occur  (when slurmctld is started without resv_state file)
How exactly you deleted this reservation?

Dominik
Comment 5 Josko Plazonic 2018-05-18 13:04:56 MDT
Hullo,

we did nothing weird - just deleted it and as far as I can remember everything was up and running (slurmdbd, slurmctlds...).

Josko
Comment 6 Dominik Bartkiewicz 2018-05-21 09:40:44 MDT
Hi

Do you have any logs from time when you created/deleted these reservations?
I am still trying to recreate this.

Dominik
Comment 7 Josko Plazonic 2018-05-21 21:00:02 MDT
I checked and nothing obvious but some checking of the mysql table may point the way - these are relevant bits from tiger2_resv_table (current state but I checked our oldest daily backup from April 21st and it is almost the same - just the accounted time is different for "active" reservation): 

39,0,"2",36929,"tiger-h19c[1-2]n[1-24],tiger-h20c[1-2]n[1-24],tiger-h21c[1-2]n[1-24],tiger-h22c1n[1-24],tiger-h23c[1-2]n[1-24],tiger-h24c[1-2]n[1-24],tiger-h25c[1-2]n[1-24],tiger-h26c[1-2]n[1-24],tiger-i26c1n[1-24],tiger-i26c2n[1-8]","0-359,440-471","root_39",1523354400,1554890400,"1=15680",3618000
39,0,"2",36929,"tiger-h19c[1-2]n[1-24],tiger-h20c[1-2]n[1-24],tiger-h21c[1-2]n[1-24],tiger-h22c1n[1-24],tiger-h23c[1-2]n[1-24],tiger-h24c[1-2]n[1-24],tiger-h25c[1-2]n[1-24],tiger-h26c[1-2]n[1-24],tiger-i26c1n[1-24],tiger-i26c2n[1-8]","0-359,440-471","root_39",1523359983,1523367030,"1=15680",7047
39,1,"2",36929,"tiger-h19c[1-2]n[1-24],tiger-h20c[1-2]n[1-24],tiger-h21c[1-2]n[1-24],tiger-h22c1n[1-24],tiger-h23c[1-2]n[1-24],tiger-h24c[1-2]n[1-24],tiger-h25c[1-2]n[1-24],tiger-h26c[1-2]n[1-24],tiger-i26c1n[1-24],tiger-i26c2n[1-8]","0-359,440-471","root_39",1523367030,1523380986,"1=15680",13956
40,0,"2",36929,"tiger-i[19-23]g[1-16]","360-439","root_40",1523354400,1554890400,"1=2240",3618000
40,0,"2",36929,"tiger-i[19-23]g[1-16]","360-439","root_40",1523359983,1523367030,"1=2240",7047
40,1,"2",36929,"tiger-i[19-23]g[1-16]","360-439","root_40",1523367030,1523380978,"1=2240",13948

Our original reservation started at 6am and we set its end date to +1 year (to avoid having to extend it) but it looks we had another set added starting at 7:33am and ending at 9:30am, then another from 9:30am until (real) end of maintenance at 13:23.

Now, we upgraded from 17.02 to 17.11 on April 10 and there were no duplicate reservations before maintenance (from pre-upgrade backup of db):

39,0,'2',36929,'tiger-h19c[1-2]n[1-24],tiger-h20c[1-2]n[1-24],tiger-h21c[1-2]n[1-24],tiger-h22c1n[1-24],tiger-h23c[1-2]n[1-24],tiger-h24c[1-2]n[1-24],tiger-h25c[1-2]n[1-24],tiger-h26c[1-2]n[1-24],tiger-i26c1n[1-24],tiger-i26c2n[1-8]','0-359,440-471','root_39',1523354400,1554890400,'1=15680'
40,0,'2',36929,'tiger-i[19-23]g[1-16]','360-439','root_40',1523354400,1554890400,'1=2240'

I guess I see how I could solve the problem at hand - i.e. I could update by hand those tables, probably set the end time of the tiger reservation to start of the next one though not sure about time/last param what it should be.  But how did it get this way? Was it because we restarted slurmctld at some point (or slurmdbd) or ...?
Comment 8 Dominik Bartkiewicz 2018-05-22 09:50:36 MDT
Hi

I tried to understand and recreate this. 
We don’t have anything similar to "sacctmgr show runaway" for reservations. I don’t see any easy way to force recalculating of unused_wall and assoc usage.
unused_wall=(resv_end -resv_start) - sum(job_wall_time * (job_tres/resv_tres))
if we assume there was no job in this reservation, we can set equally to (resv_end -resv_start).

Dominik
Comment 9 Josko Plazonic 2018-05-22 13:23:32 MDT
There were no jobs, it was for a maintenance and we are OK if this temporarily messes up time stats. I guess you are OK with me fixing it via some sql calls?

Also, I guess we don't understand how this happened? And this is not the first time it happened - we have that earlier reservation from 2017 that had the similar problem.
Comment 10 Dominik Bartkiewicz 2018-05-25 11:28:05 MDT
Hi

I finally recreated this.
To did this I needed to restart slurctld multiple times while slurmdbd was down. Have you corrected corrupted record in MySQL already?

Dominik
Comment 11 Josko Plazonic 2018-05-25 12:01:19 MDT
Hullo,

no, we haven't touched them yet - was waiting on your opinion.  Do you have a good recipe to do this or should I attempt something on my own.

Also good to hear that you reproduced it - I assume some kind of a fix will be forthcoming?

Thanks!
Comment 12 Dominik Bartkiewicz 2018-05-25 14:58:23 MDT
(In reply to Josko Plazonic from comment #11)
> Hullo,
> 
> no, we haven't touched them yet - was waiting on your opinion.  Do you have
> a good recipe to do this or should I attempt something on my own.
> 

I have applied this procedure on my devbox :
1) stop slurmdbd 
2) update <cluster>_resv_table set unused_wall=0,time_end=<> where id_resv=<resv_id> and time_start=<>; (fix reservation record )
next steps are only required to recalculate used records
3) update <cluster>_last_ran_table set hourly_rollup=<time_start>,daily_rollup=<time_start>; (force rollup on slurmdbd start)
4) delete from <cluster>_assoc_usage_hour_table where time_start > <time_start>;
5) delete from <cluster>_assoc_usage_day_table where time_start > <time_start>;
5) start slurmdbd

It's an experimental procedure, I will ask Tim to check if he sees any weak points in it.
Of course, backup MySQL db before any manual modification is strongly recommended.


> Also good to hear that you reproduced it - I assume some kind of a fix will
> be forthcoming?
> 
Yes, this is a bug and we will fix it.
Dominik
> Thanks!
Comment 13 Dominik Bartkiewicz 2018-06-11 06:36:28 MDT
Hi

We think the procedure from comment can be simplified and should look like this:

1) stop slurmdbd 
2) update <cluster>_resv_table set unused_wall=0,time_end=<> where id_resv=<resv_id> and time_start=<>; (fix reservation record )
next steps are only required to recalculate used records
3) update <cluster>_last_ran_table set hourly_rollup=<time_start>,daily_rollup=<time_start>,monthly_rollup=<time_start>; (force rollup on slurmdbd start)
4) start slurmdbd


Dominik
Comment 14 Dominik Bartkiewicz 2018-06-22 09:10:03 MDT
*** Ticket 5341 has been marked as a duplicate of this ticket. ***
Comment 15 Dominik Bartkiewicz 2018-06-28 06:29:32 MDT
Hi
I prepared this queries, first one lists all reservations that end after the next record started, the second one fixes ending time and clears unused wall for these records

SELECT  *,(SELECT time_start FROM test_resv_table resv_tmp WHERE resv_tmp.time_start>test_resv_table.time_start and resv_tmp.id_resv=test_resv_table.id_resv ORDER BY resv_tmp.time_start LIMIT 0, 1) AS next_start
        FROM test_resv_table 
        WHERE time_end != (SELECT time_start FROM test_resv_table resv_tmp WHERE resv_tmp.time_start>test_resv_table.time_start and resv_tmp.id_resv=test_resv_table.id_resv ORDER BY resv_tmp.time_start LIMIT 0, 1);

UPDATE test_resv_table AS resv INNER JOIN test_resv_table AS resv_tmp
        ON resv.id_resv = resv_tmp.id_resv AND
        resv.time_end > resv_tmp.time_start AND
        resv_tmp.time_start =
                (SELECT time_start FROM (SELECT * FROM test_resv_table) AS tmp WHERE
                        tmp.time_start>resv.time_start AND
                        tmp.id_resv=resv.id_resv
                        ORDER BY tmp.time_start LIMIT 0, 1)
        SET resv.time_end=resv_tmp.time_start , resv.unused_wall=0; 


Dominik
Comment 22 Dominik Bartkiewicz 2018-08-21 03:50:40 MDT
Hi

Finally, we commit fix for this issue:
https://github.com/SchedMD/slurm/commit/e42b0927f3b121f3b
It is in 18.08 branch and will be in all future 18.08 releases.

Sorry that this took so long.
Closing as resolved/fixed, please reopen if needed.

Dominik
Comment 23 Josko Plazonic 2018-08-21 05:59:12 MDT
Great - thanks!