Description
S Senator
2023-05-16 10:55:07 MDT
The tables changed in 22.05. If you are indeed running 22.05.6, that fix won't work. However, I am surprised this issue happened in 22.05. That would mean that each job had a different env_vars string. Though perhaps an env variable was being incremented? -Scott The environment variables are stored in a table called <clustername>_job_env_table. This is linked to each job in the <clustername>_job_table with env_hash_inx. I recommend you follow the procedure in bug 15383 comment 8 if you want to remove all the data from the <clustername>_job_env_table. -Scott The LD_LIBRARY_PATH variable was appended to with every job invocation, so it was changing and growing for each job. Multiply by 300k+ jobs and it became a problem. We intend to increase our archiving and purging, within the oversight and accounting policies that our systems are subject to. Are there a parameters such as: PurgeJobEnvAfter and PurgeJobScriptAfter? We cannot use PurgeJobAfter records themselves completely, but we could purge the job scripts and their environments reasonably aggressively, approximately on the order of weeks. Since the tables have changed, please suggest the appropriate "trim" SQL table update command, perhaps resembling (based on the other comment): MariaDB > truncate <clustername>_job_env_table where user=<user-name>; so that only this particular user-relevant data is truncated. I think you will want something like this:
>DELETE from <clustername>_job_env_table where hash_inx IN (SELECT env_hash_inx from <clustername>_job_table where id_user=<uid_of_user>);
I did test this, but I recommend you double check it first if some of the data in the job_env_table is important.
We currently do not have a feature like PurgeJobEnvAfter and PurgeJobScriptAfter.
-Scott
I've done that and then used optimize table <clustername>_job_env_table; but don't see a reduction in size of the table. Guidance appreciated. I was impatient. The table size did get reduced after ~10+ minutes. Thank you. Do you have any other questions about this? -Scott No additional questions, thank you. As always we appreciate the prompt, thorough & information-rich responses you & your team provides. Glad we could help. -Scott Please reconsider this as a recurring bug which needs to be fixed. We now purge the script and env tables in lockstep with the job table as specified in PurgeJobAfter. See commit b2bc5ec5670f85. This is in 23.02 and later. Is this what you are asking about? -Scott > We now purge the script and env tables in lockstep with the job table as specified in PurgeJobAfter.
Not exactly. We really do not want to purge jobs. But we do need to purge job scripts, job environments and job steps. Purging at the job level is too coarse-grained, but is our workaround.
We could probably add PurgeScriptAfter / PurgeEnvAfter options as an NRE project. Is this something you are interested in? -Scott I see that ticket 16954 is already opened on this issue. Is there a specific reason you choose to reopen this ticket instead of discussing the development request in 16954? -Scott (In reply to Scott Hilton from comment #18) > I see that ticket 16954 is already opened on this issue. > > Is there a specific reason you choose to reopen this ticket instead of > discussing the development request in 16954? > > -Scott This one was fresher in my history and I found it first. So, no rational or technical reason to use this one vs. 16954. FYI- We have requested funding for 16954 Enhance SOW. *** This ticket has been marked as a duplicate of ticket 16954 *** |