Summary: | Job table size is causing DBD problems | ||
---|---|---|---|
Product: | Slurm | Reporter: | Paul Peltz <peltzpl> |
Component: | Database | Assignee: | Scott Hilton <scott> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | kauffman, mcoyne, rcwhite, sts |
Version: | 21.08.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=16746 | ||
Site: | ORNL-OLCF | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | AFW Miller/Fawbush |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Paul Peltz
2023-04-27 10:35:42 MDT
Paul, Good news, job_script and job_env are stored in a much more efficient way in 22.05 and 23.02. Also, this issue has happened before. See bug 14514. The only negative consequences to purging the job_script and job_env information would be the loss of the job_script and job_env information. See https://bugs.schedmd.com/show_bug.cgi?id=14514#c9 on how to do it. Let me know if you have any more questions or run into any issues. -Scott Thanks! I had tried some searching but couldn't find what that bug. We'll give that a try. Paul, Did that work out for you? Any questions? -Scott We just started the purge of the script and env data this afternoon and are awaiting it to complete to see if we can continue purging or not. I believe it reduced the db size by about 25%, but not as much as we expected it to in our test instance. Paul, How did the upgrade go? -Scott We just enabled the purge last night after dropping the job_env and job_script rows from the DB and it was able to finish the purge down to 180 days within 5 hours. It was taking hours for just a single day to complete and most of the time it failed due to timeouts. So it has helped to reduce it down drastically. We are going to reduce the purge to 90 days and test the upgrade again to time how long it will take. So I think we can resolve this issue as the db purge and archive actually works now and we can effectively plan for the update to 23.02.2 now. Thanks! Paul, I'm glad we could help. -Scott |