Ticket 23818 - Plans to introduce purging parameters for the database job_script_table?
Summary: Plans to introduce purging parameters for the database job_script_table?
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 25.05.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Stephen Kendall
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2025-10-06 05:39 MDT by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2026-04-03 02:32 MDT (History)
0 users

See Also:
Site: DTU Physics
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 26.05.0-0rc1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ole.H.Nielsen@fysik.dtu.dk 2025-10-06 05:39:31 MDT
This is really a question more than an issue: We would like to enable the AccountingStoreFlags=job_script flag [1] in slurm.conf so that we can offer better user support to our users when they ask why a particular JobId was having a problem.  We hear from other sites that this is a very useful feature.

Unfortunately, the database table <clustername>_job_script_table in the database would be growing without bounds because there doesn't seem to be any way to purge older job scripts from the database.  Similar to the Purge* options [2] in slurmdbd.conf which we use successfully to keep our database manageable in size.  We have heard from other sites who use the job_script flag, that a slurmdbd version upgrade can take many, many hours of table upgrading :-(

Question: Does SchedMD have any plans to introduce a new "PurgeJobScript" parameter (and similar for the other AccountingStoreFlags options)?

If such purging parameters are not included in the current roadmaps, is there a safe way to clear the <clustername>_job_script_table in the database at regular intervals?

Thanks very much,
Ole

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_job_script
[2] https://slurm.schedmd.com/slurmdbd.conf.html#OPT_PurgeJobAfter
Comment 1 Stephen Kendall 2025-10-06 09:21:14 MDT
Hi Ole,

The job script table is targeted by the 'PurgeJobAfter' field (same for the job env table if enabled). I will note that when enabled, the script table is expected to become particularly large, so you may want to set that purge interval shorter than you otherwise would to keep the database size manageable.

Let me know if you have any further questions.

Best regards,
Stephen
Comment 2 Ole.H.Nielsen@fysik.dtu.dk 2025-10-06 11:47:18 MDT
Hi Stephen,

(In reply to Stephen Kendall from comment #1)
> The job script table is targeted by the 'PurgeJobAfter' field (same for the
> job env table if enabled). I will note that when enabled, the script table
> is expected to become particularly large, so you may want to set that purge
> interval shorter than you otherwise would to keep the database size
> manageable.

To clarify my request, I believe that I'm asking for a different type purge functionality.  Our policy is to *never* purge any of the job information, so that our database still contains accounting information of all jobs back to the "beginning of time" a number of years ago.

What I'm asking for is, if we were to configure job_script archival, how to purge *only* the <clustername>_job_script_table in the database.  For example, 7 days worth of scripts for the purpose of user support, while never purging any jobs at all from the database for the purpose of accounting.

Do you think this will become feasible at some point along the roadmap?

Thanks,
Ole
Comment 3 Stephen Kendall 2025-10-06 11:59:55 MDT
I see, thanks for the clarification. I'm not aware of any prior discussion of this idea, and it would need to be done in a way that doesn't cause issues for other sites who are already using the existing behavior. I will check internally on this.

Best regards,
Stephen
Comment 4 Ole.H.Nielsen@fysik.dtu.dk 2025-10-06 12:02:45 MDT
(In reply to Stephen Kendall from comment #3)
> I see, thanks for the clarification. I'm not aware of any prior discussion
> of this idea, and it would need to be done in a way that doesn't cause
> issues for other sites who are already using the existing behavior. I will
> check internally on this.

Thanks for checking.  If purging this table is not going to be implemented, would my second questionin comment 0 be feasible?

If such purging parameters are not included in the current roadmaps, is there a safe way to clear the <clustername>_job_script_table in the database at regular intervals?

Thanks,
Ole
Comment 5 Ole.H.Nielsen@fysik.dtu.dk 2025-10-06 12:05:34 MDT
(In reply to Stephen Kendall from comment #3)
> I see, thanks for the clarification. I'm not aware of any prior discussion
> of this idea, and it would need to be done in a way that doesn't cause
> issues for other sites who are already using the existing behavior. I will
> check internally on this.

Actually, IMHO, introducing a new "PurgeJobScript" parameter in slurmdbd.conf shouldn't break anyone's current setup because it would be an optional parameter.

Thanks,
Ole
Comment 6 Stephen Kendall 2025-10-24 14:02:55 MDT
So far we are looking further into this for inclusion on a future version, but we don't have it on the roadmap for a specific release. One of the complications is that currently, the job script purges don't directly look at the time, but instead look at what jobs have already been purged and remove any scripts that were associated with those jobs. Thus adding script-only purges would require the addition of a separate purge path.

(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #4)
> If such purging parameters are not included in the current roadmaps, is
> there a safe way to clear the <clustername>_job_script_table in the database
> at regular intervals?

In general, we do not recommend any manual interventions in the accounting tables as it is easy to mistakenly delete the wrong rows or make other changes to the table that would cause errors or inconsistent results from 'slurmdbd'. If such interventions are taken, they should always be done with 'slurmdbd' stopped and after a full backup of the database.

I was also able to find an alternate method of saving scripts that might be more practical, especially since this enhancement would be at least several months out from release. Using a 'job_submit' script, it would be possible to dump each submitted job script to a file on the controller. At that point you can easily prune old ones as you see fit. However, this does have the disadvantage of occurring before a job ID is assigned to the job, so it makes it more of a challenge to match up with jobs visible on the controller or in the accounting database.
https://slurm.schedmd.com/job_submit_plugins.html

In any case, I would suggest enabling 'AccountingStoreFlags=job_script' temporarily to determine how quickly the table fills up. You can determine the on disk size of the database files in the configured datadir ('/var/lib/mysql/' on most installations). You may find that the scripts generated by your environment are practical to retain for a long time as the storage demands are highly workload-dependent based on the number of jobs and length of the scripts. With close monitoring, you will be able to estimate how quickly the table will grow in size well before it will slow down upgrades or other tasks. And if you find that the size looks manageable, you can leave it enabled long-term.

Best regards,
Stephen
Comment 7 Ole.H.Nielsen@fysik.dtu.dk 2025-10-27 11:59:51 MDT
Hi Stephen,

Thanks a lot for the explanation of the complexity of purging job scripts based on job completion time.  IMHO, this would be the preferred functionality, and I hope the developers will consider adding a new "PurgeJobScript" parameter in slurmdbd.conf in the not too distant future.  

Probably most sites don't want the job scripts in the databases after a few weeks, when no more user support requests about specific jobids could be anticipated.  On the other hand, job accounting requires jobs to be stored for long times (probably years; we store job accounting records back to day zero).

I appreciate the warnings about database manipulation, and I will avoid doing that.

Probably we will refrain from enabling 'AccountingStoreFlags=job_script' at all, because those scripts would remain in the database forever.  I realize that it's a very useful functionality, but retaining a lean and efficient database, also across major updates, will most likely have priority at our site.

Best regards,
Ole
Comment 10 Stephen Kendall 2026-04-02 16:22:01 MDT
Hi Ole,

We have checked in enhancements to add specific purge options for script and env. In 'slurmdbd.conf', these take the form of 'PurgeJobScriptAfter' and 'PurgeJobEnvAfter' and the corresponding Archive boolean fields. This was split into several different commits, here's the one containing that added the parsing for the config options:
https://github.com/SchedMD/slurm/commit/204d1f112549ebc1a37ff8465788fbffff6ad4e3

We are still fine tuning the related documentation but all the config parsing and handling in the code is checked in ahead of 26.05. Let us know if you have any further questions.

Best regards,
Stephen
Comment 11 Ole.H.Nielsen@fysik.dtu.dk 2026-04-03 02:32:18 MDT
Hi Stephen,

(In reply to Stephen Kendall from comment #10)
> We have checked in enhancements to add specific purge options for script and
> env. In 'slurmdbd.conf', these take the form of 'PurgeJobScriptAfter' and
> 'PurgeJobEnvAfter' and the corresponding Archive boolean fields. This was
> split into several different commits, here's the one containing that added
> the parsing for the config options:
> https://github.com/SchedMD/slurm/commit/
> 204d1f112549ebc1a37ff8465788fbffff6ad4e3
> 
> We are still fine tuning the related documentation but all the config
> parsing and handling in the code is checked in ahead of 26.05. Let us know
> if you have any further questions.

Thanks very much for making these enhancements!  We appreciate very much the upcoming  'PurgeJobScriptAfter' and
'PurgeJobEnvAfter' parameters for slurmdbd.conf, since this will enable us to safely use the AccountingStoreFlags=job_script flag 
without our database growing indefinitely!  I trust that many other sites are going to find this feature very useful!

A bit of warning for those sites enabling the new purge parameters is to introduce them *very* gradually as described in my Wiki page [1],
for example:

PurgeJobScriptAfter=2000days
PurgeJobEnvAfter=2000days

and lower the values little by little over time.

Thanks,
Ole

[1] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_database/#setting-database-purge-parameters