Ticket 7609 - Store Job Scripts in Slurm Database
Summary: Store Job Scripts in Slurm Database
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 21.08.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Danny Auble
QA Contact:
URL:
: 5754 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2019-08-21 08:51 MDT by Paul Edmon
Modified: 2023-03-06 09:16 MST (History)
6 users (show)

See Also:
Site: Yale
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08.0pre1
Target Release: 21.08
DevPrio: 1 - Paid
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Paul Edmon 2019-08-21 08:51:13 MDT
It would be great if Slurm could automatically ship the jobscript to the slurmdbd as an additional field for the job.  If the size ends up being too much for the database, perhaps some size limit for the script and some way of archiving the scripts would be feasible as slurmdbd already supports archive certain things on 
different time scales.  I know that this option would be of great service to us and the community at large as users are typically not savvy as to send us the scripts they used when they have problems, or perhaps don't even remember what they submitted.

Thanks.

-Paul Edmon-
Comment 1 Jason Booth 2019-08-21 10:35:12 MDT
Hi Paul -  We are curious to know how you would use this data and why you would like to store this information in the database? Based on what you have written it sounds like you use this to diagnose failed jobs?

 It is currently possible to write out the job script with "scontrol" and to obtain this from the  

write batch_script job_id optional_filename

Write the batch script for a given job_id to a file or to stdout. The file will default to slurm-<job_id>.sh if the optional filename argument is not given. The script will be written to stdout if - is given instead of a filename. The batch script can only be retrieved by an admin or operator, or by the owner of the job.

For example:
 >scontrol write batch_script 25596
 >batch script for job 25596 written to slurm-25596.sh


It is also possible to pull this from the state "StateSaveLocation"
e.g.
> <StateSaveLocation>/hash.6/job.25596/script
Comment 2 Paul Edmon 2019-08-21 11:04:38 MDT
So this is the post fact get the jobscript.  We don't keep completed or 
failed job information very long in our system as we turn over a ton of 
jobs (it ages out in 15 minutes).  Usually users will come back days 
later and ask about a job after it has rolled out of our the live system 
and the only access we have to job information is the database, thus we 
don't have the job script info anymore.  We do have a script that dumps 
the job script in the flat file database, but then that script caused 
problems here: https://bugs.schedmd.com/show_bug.cgi?id=7532  Frankly 
I'm not too keen to keep that script anyways.  I would rather this info 
be in the slurm database to make it easy to find.

Beyond that it would be handy to write scripts against the database to 
do a post fact analysis of what our users are running and what modules 
they are loading.  Inspecting the scripts like that is very handy.

Also with respect to the write option for scontrol, it is handy but we 
liked the old behavior where it would just dump the script to standard 
out rather than writing it to a file.  It adds one additional step to 
get the script information as we then have to open the file rather than 
just having it print to screen and see it immediately.  So it would be 
nice is scontrol had the option to either write to a file or write to 
standard out.

So that's the use case.  It wouldn't be as much of an issue if we could 
keep the data around after jobs complete for days, but given our job 
turn over we need to purge out completed jobs from the active system as 
soon as possible.  Thus we rely on the database for historical data.

-Paul Edmon-

On 8/21/2019 12:35 PM, bugs@schedmd.com wrote:
>
> *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=7609#c1> on bug 
> 7609 <https://bugs.schedmd.com/show_bug.cgi?id=7609> from Jason Booth 
> <mailto:jbooth@schedmd.com> *
> Hi Paul -  We are curious to know how you would use this data and why you would
> like to store this information in the database? Based on what you have written
> it sounds like you use this to diagnose failed jobs?
>
>   It is currently possible to write out the job script with "scontrol" and to
> obtain this from the
>
> write batch_script job_id optional_filename
>
> Write the batch script for a given job_id to a file or to stdout. The file will
> default to slurm-<job_id>.sh if the optional filename argument is not given.
> The script will be written to stdout if - is given instead of a filename. The
> batch script can only be retrieved by an admin or operator, or by the owner of
> the job.
>
> For example:
>   >scontrol write batch_script 25596
>   >batch script for job 25596 written to slurm-25596.sh
>
>
> It is also possible to pull this from the state "StateSaveLocation"
> e.g.
> > <StateSaveLocation>/hash.6/job.25596/script
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 3 Jason Booth 2019-08-21 12:03:33 MDT
Paul - Thanks for the detailed explanation and use case. We have had other request to just preserve these job scripts external to the database (https://lists.schedmd.com/pipermail/slurm-users/2018-July/001635.html) however I did not find any requests to have this stored inside of the database. We will discuss this some internally and let you know what we think. 

 Also, is this something Harvard is interested in sponsoring development for?
Comment 4 Kilian Cavalotti 2019-08-21 12:28:52 MDT
Hi Jason,

I second Paul's request. We're currently hacking our way by copying the job script from StateSaveLocation in PrologSlurmctld, but if job script archival were to be directly integrated in the accounting mechanisms (either stored in the DB, or archived in the filesystem directly), that would be much easier.

I'd add that we also have to record the full output of "scontrol show job" in PrologSlurmctld, because sacct record miss some important information like CPU_ID and GRES_ID allocations, or output/error file locations, which are really critical to diagnose job failures.

So I'd even go further: could we please have all the information that "scontrol show job" shows as well as the job scripts preserved in the accounting records for when the jobs exceed MinJobAge?

Cheers,
-- 
Kilian
Comment 10 Andy Georges 2019-09-04 00:13:14 MDT
Hi,

If you are OK with an external solution, I can point out the following:

- https://github.com/nauhpc/job_archive
- https://github.com/itkovian/sarchive

The latter is something I wrote, similar to the former, but with some changes as to how it is stored (we do not usually have USER information in the env, so we cannot store in a .../USER/... dir). Current work is to allow shipping the job scripts and environment to Elasticsearch, but that is very much work in progress as I am doing that in my spare time.

If this could be integrated into slurm, that would be certainly a nice to have, although I am not immediately in favour of putting this into the slurm database. A separate location in the FS would suffice for us.

Kind regards,
-- Andy
Comment 12 Kilian Cavalotti 2020-12-28 13:43:20 MST
Hi,

I am currently out of office, returning on January 4. 

If you need to
reach Stanford Research Computing, please email srcc-support@stanford.edu

Cheers,
Comment 14 Danny Auble 2021-03-15 09:41:37 MDT
Ben, I am making some changes to this SOW and want to make sure they are ok with you.

1. I would rather use sacct instead of sacctmgr of retrieving the new data.  sacctmgr does not typically deal with reading job data and would rather keep all job related querying in sacct.

2. I will also be storing the new information in the job table not in a separate table as the SOW called for.  I couldn't see a good reason to split it off as it would only complicate (and slow down) querying and managing of data.  Because of this there will be no need for PurgeScriptsAfter as PurgeJobsAfter will purge the script at the same time the job record is purged.

3. The current format to display data I am using is like this below.  Add a new option to sacct '--batch-script' which will query the database for only the batch script and display it i.e.

sacct --batch-script -j 76118,76119
Batch Script for 76118
--------------------------------------------------------------------------------
NONE

Batch Script for 76119
--------------------------------------------------------------------------------
#!/bin/bash
#SBATCH -oout

echo "Yes this is it\n"
echo 'I have it!\n'
srun -l hostname
wait
exit 0

I am limiting this to only allow for requested jobs to avoid a user requesting all jobs in a time period potentially overwhelming things.

The same will go for env vars (--env-vars) as well.

4. Because this is running through sacct the same limitations for querying jobs will apply. PrivateData=jobs will limit others from seeing these requests unless the user is an operator or administrator.  The SOW didn't mention PrivateData as being needed to limit this request, but I felt it made more sense to make it this way to behave as any other thing in the job.  If you feel differently on this it can be altered, but I feel this makes more sense and consistent with current behavior.

5. Also since we will be using sacct instead of sacctmgr cross cluster querying will work for free with the '-M' option. 

Please let me know if you have any concerns with these changes or not.
Comment 19 Danny Auble 2021-03-23 12:14:38 MDT
Ben, this functionality has been added to 21.08 in commits 4a0f3eeeb61..b307d92a61f5.

This adds a new parameter to slurm.conf AccountingStoreFlags which replaces AccountingStorageJobComment.

AccountingStoreFlags's current parameters are

job_comment -
(Previously AccountingStorageJobComment) which will store the job's comment.

As before you can get this from

sacct -o comment

job_script - Which will store the whole job script.

This will only be accessible from the user that ran the job or a Slurm Operator or Super User.

You can get this from

sacct --batch -j $JOBID

This requires you specify the jobid to limit returning a host of other jobs which could potentially overwhelm the system.

job_env - Which will store the batch script's env of a batch job.

sacct --env-vars -j $JOBID

As with job_script this also requires you specify the jobid to limit returning a host of other jobs which could potentially overwhelm the system.

Please note, while PrivateJobs will limit both these 2 options it is not required to keep the data private as I questioned in comment 14.

Also as an added bonus the job/step's submit line is also now stored in the database and can be retrieved from

sacct -o submit_line

This is turned on always without a AccountingStoreFlags option.

Please test and see what you think.
Comment 20 Ben Evans 2021-03-23 12:29:26 MDT
Danny,

This looks great. I'm especially excited for sacct -o submit_line . Unfortunately this week for me is pretty much booked but I'll do my best to carve out time to take a look early next week.
Comment 21 Danny Auble 2021-03-23 13:32:03 MDT
No problem Ben,

I misspoke on the submit line option.  It does not have the '_' option.

So

sacct -o submitline

is the option.

Let me know when you are satisfied and I will close this bug out.
Comment 26 Ben Evans 2021-03-31 14:00:40 MDT
Danny,

We have this installed and working on a test cluster. Things look good. Last question I have is if there is any way a user could accidentally submit a data file (or a batch script they concatenated with some other large blob) that could make its way into the database. That is to say, is there a maximum file size check before it gets stored in the database?
Comment 27 Danny Auble 2021-04-01 10:22:35 MDT
The limitation the slurmctld puts on the batch script is 4G by default. That is also the max size you can store per job.  At the moment there is nothing decoupling those 2 limits and you can only restrict from the slurmctld with the

SchedulerParameters=max_script_size=

Which will gate the size of file the script could be.

Clearly the user could make things fairly bad if they continually sent in 4G files. I don't think that has ever happened, at least I have never heard of this being a problem.

We do check with sbatch to see if the file is at least a script...

batch whereami.c
sbatch: error: This does not look like a batch script.  The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh

So that should protect from the random datafile, but if they concat a large blob to the script it would be sent to the database as long as it is under the 4G limit.
Comment 28 Ben Evans 2021-04-01 11:03:25 MDT
max_script_size seems sufficient to me. When we have 21.08 in production who knows what enterprising users might find, but for now I think we can mark this as complete. Thanks again!
Comment 29 Danny Auble 2021-04-01 11:05:23 MDT
Sounds good Ben, yeah users are fairly resourceful :).

If you do find other things on this please open a new bug and we will handle it.

Thanks!
Comment 30 Brian Christiansen 2021-05-25 10:00:52 MDT
*** Ticket 5754 has been marked as a duplicate of this ticket. ***