Ticket 9259

Summary: Extremely slow archive dump
Product: Slurm Reporter: Matt Ezell <ezellma>
Component: DatabaseAssignee: Scott Hilton <scott>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: albert.gil, broderick, kilian
Version: 19.05.5   
Hardware: Linux   
OS: Linux   
Site: NOAA Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: ORNL OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 20.11.0pre1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Matt Ezell 2020-06-20 12:52:51 MDT
We went into production without any purge/archive settings. Our database has grown quite large, so we decided to turn on purging steps (by far our largest usage but not that useful after a month or so). On one cluster we had ~20M records.

We are in an outage now for software upgrades, so we decided to do the cleanup in bulk. I initiated the dump, but it was going very slow. Each file of 50K records was taking over 3 minutes to generate.

Looking at MariaDB, it was spending most of the time sorting the results. There is no index on the `time_start` key which is the ORDER BY condition.

I temporarily added an index to see if I could finish the purge before the end of the maintenance window.

alter table `c3_step_table` add index time_start_index (`time_start`)

As soon as that finished, purges started going really fast.

A couple questions:
1. Any harm in leaving this index here? I assume it might add a little overhead when steps get saved to the database, but that's probably negligible.
2. Should Slurm create this index by default?
Comment 2 Scott Hilton 2020-06-22 16:39:38 MDT
Matt,

You should probably be fine leaving the index in. 

Thanks for letting us know about the slow purge process. We will look into this and decide if it is a change we want to make.

Thanks,

Scott
Comment 4 Scott Hilton 2020-06-22 17:08:24 MDT
Matt,

Actually, it sounds like slurm should automatically remove that index you added next time your restart the slurmdbd. Also, in general, we do not recommend messing with the database directly. However, I still believe that simply adding a key/index like you did should be  OK.

If we did make a change to slurm to speed up that query it would probably be a part of the 20.11 release.

-Scott
Comment 14 Scott Hilton 2020-07-07 16:58:35 MDT
I am going to close this issue. 

Good luck, 

Scott
Comment 15 Matt Ezell 2020-07-07 18:14:56 MDT
Sorry - it seems that there were 9 private comments since your last public message on 6/22 - but I don't understand what is the path forward here. SchedMD is or is not planning to add such an index? If not, we are just supposed to live with this being slow (probably not much of an issue now that we are "caught up")?
Comment 16 Scott Hilton 2020-07-08 09:07:21 MDT
Matt, 

Sorry for keeping you out of the loop. My bad. 

I have been doing some tests, though on the scale of minutes not hours, and cannot see any positive difference from adding an index. 

Could you tell me a bit more about your experience with adding the index. Are you sure it sped up the process or did it just happen to finish up soon after you added the index?

Thanks,

Scott
Comment 20 Scott Hilton 2020-08-10 09:59:30 MDT
Matt,

I took another look at it and was able to verify its usefulness. It is added to slurm release 20.11, commit id: dcc2a75d9a21bf2

-Scott