We've been using the ability to sort job scripts and job envs in our slurm database for about a year now. While we've found the job scripts useful for debugging purposes we've found that we don't use the job envs that much. Also given how much space the job envs take up in the database we'd like to get rid of them. What do you recommend as the best course of action for getting rid of the job envs? If I remove the collection of them in the slurm.conf will they naturally purge out? Or do we need to drop the table? Can we purge that out live or should we wait for a maintenance to do this? What commands do you recommend I run? Our job env table is currently 43 GB so its pretty big. -Paul Edmon-
Paul, First you would need to stop recording job_env in new jobs. You would do this by removing job_env from AccountingStoreFlags in your slurm.conf I will have to look into how to properly purge it from your database. -Scott
Yup, I'm going to switch that off when I have a moment. -Paul Edmon- On 11/9/22 4:46 PM, bugs@schedmd.com wrote: > > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c1> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > First you would need to stop recording job_env in new jobs. You would do this > by removing job_env from AccountingStoreFlags in your slurm.conf > > I will have to look into how to properly purge it from your database. > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Paul, Slurm does not currently purge from the hash tables. We may fix that in 23.02. If you want to clear out the data you will have to do that manually in the mysql database. I don't recommend dropping the table. Simply clearing the table is better: >delete from <clustername>_job_env_table; If you do this, make sure the slurmdbd is not running while you edit the database. -Scott
Okay, so this can't be done live. That's good to know. I highly recommend a method of purging out and cleaning hash tables else they will grow indefinitely. Even with the hashing and deduping the growth won't stop. So there needs to be a way to clean those out. -Paul Edmon- On 11/10/2022 4:11 PM, bugs@schedmd.com wrote: > > *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c3> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > Slurm does not currently purge from the hash tables. We may fix that in 23.02. > > If you want to clear out the data you will have to do that manually in the > mysql database. > > I don't recommend dropping the table. Simply clearing the table is better: > >delete from <clustername>_job_env_table; > > If you do this, make sure the slurmdbd is not running while you edit the > database. > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Could you send me the precise commands to do the clean out? I'd like to do this before the next major upgrade so that when we take our backup database dump it won't take forever because it has to write out all the envs we don't need anymore. -Paul Edmon- On 11/10/2022 4:11 PM, bugs@schedmd.com wrote: > > *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c3> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > Slurm does not currently purge from the hash tables. We may fix that in 23.02. > > If you want to clear out the data you will have to do that manually in the > mysql database. > > I don't recommend dropping the table. Simply clearing the table is better: > >delete from <clustername>_job_env_table; > > If you do this, make sure the slurmdbd is not running while you edit the > database. > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Paul, Assuming the cluster name is odyssey and you didn't change the StorageLoc in your slurmdbd.conf the commands would be: >use slurm_acct_db; >delete from odyssey_job_env_table; However I am not sure how well your system would do with deleting 43GB. Perhaps you would want to limit the transaction size? -Scott
We did this before at the last major upgrade and while it did purge out the table it took quite a while. We would probably need to budget some time at a maintenance to do it. Thanks for the info. -Paul Edmon- On 11/11/2022 7:14 PM, bugs@schedmd.com wrote: > > *Comment # 6 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c6> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > Assuming the cluster name is odyssey and you didn't change the StorageLoc in > your slurmdbd.conf the commands would be: > >use slurm_acct_db; >delete from odyssey_job_env_table; > > However I am not sure how well your system would do with deleting 43GB. Perhaps > you would want to limit the transaction size? > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Paul, I believe that you should be able to use the truncate command in this case since we are clearing the whole table. Truncate should be faster than the delete command. >use slurm_acct_db; >truncate odyssey_job_env_table; -Scott
Nice. I will have to try that out. I ran test delete using the previous command and it took 30 minutes to clear the table. I will have to see how fast truncate is. -Paul Edmon- On 11/16/2022 11:36 AM, bugs@schedmd.com wrote: > > *Comment # 8 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c8> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > I believe that you should be able to use the truncate command in this case > since we are clearing the whole table. Truncate should be faster than the > delete command. > >use slurm_acct_db; >truncate odyssey_job_env_table; > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Just thought you'd like to see the outcome: MariaDB [st]> truncate odyssey_job_env_table; Query OK, 0 rows affected (2 min 36.267 sec) The truncate took 3 minutes while the delete took 30. So I definitely recommend the truncate. -Paul Edmon- On 11/16/22 11:36 AM, bugs@schedmd.com wrote: > > *Comment # 8 <https://bugs.schedmd.com/show_bug.cgi?id=15383#c8> on > bug 15383 <https://bugs.schedmd.com/show_bug.cgi?id=15383> from Scott > Hilton <mailto:scott@schedmd.com> * > Paul, > > I believe that you should be able to use the truncate command in this case > since we are clearing the whole table. Truncate should be faster than the > delete command. > >use slurm_acct_db; >truncate odyssey_job_env_table; > > -Scott > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. >
Paul, Glad to hear you were able to clear the table. Beginning in 23.02 Slurm will archive and purge the job_script_table and job_env_table when/if it archives and purges the job_table. This change has been pushed to master with commits 60aa4c148d..ee944b046b -Scott