Ticket 10056

Summary: Add scrontab
Product: Slurm Reporter: Tim Wickberg <tim>
Component: User CommandsAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: agaur, ben, csamuel, dmjacobsen, gmdexter, sts
Version: 20.11.x   
Hardware: Linux   
OS: Linux   
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 20.11.0pre1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Proposed patch

Description Tim Wickberg 2020-10-25 22:53:37 MDT
scrontab support has now been merged into master as of the following commit:

commit 882510ae2bfbbe0cd5813631abe063e4178ad537
Merge: 7b74b20b31 7fb52b2304
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Sun Oct 25 22:19:03 2020 -0600

    Merge branch 'cron'


There is initial documentation available in the man page. It does require setting a new option - ScronParameters=enable - to enable support.

It is expected that you will want to setup some specific cli_filter and/or job_submit routing to a dedicated queue for processing these. If you have some general feedback on this we'd be happy to add additional documentation.

As with all new Slurm features, I'm sure there will be some initial teething problems. We may be able to address some of that in the 20.11 release cycle through additional ScronParameters options depending on complexity.

Let me know if you run into any problems, otherwise I'm tagging this development project as complete.

- Tim
Comment 2 Chris Samuel (NERSC) 2020-10-30 00:14:31 MDT
Hi Tim,

Thanks so much for this! I've been testing it a little on Gerty and so far it works.  The only thing I've noticed is that I have to repeat the these definitions for every line I add:

# min hour day-of-month month day-of-week command
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
*/5 * * * * bash -c '(date; echo ${SLURM_JOB_ID}) > /tmp/csamuel.test'
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
@hourly bash -c '(date; echo ${SLURM_JOB_ID}) > /global/homes/c/csamuel/scrontab.hourly
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
@daily bash -c '(date; echo ${SLURM_JOB_ID}) > /global/homes/c/csamuel/scrontab.daily

I was wondering if it was possible for it to remember the previous ones if none were specified (and forget everything if something is mentioned).

Say:

# min hour day-of-month month day-of-week command
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
*/5 * * * * bash -c '(date; echo ${SLURM_JOB_ID}) > /tmp/csamuel.test'
@hourly bash -c '(date; echo ${SLURM_JOB_ID}) > /global/homes/c/csamuel/scrontab.hourly
15 */3 * * * bash -c 'last csamuel | Mail -s "Very bad IDS" csamuel@lbl.gov'
##
# Do some work
##
#SCRON -q gpu
#SCRON -A gofast
#SCRON --time 15:35:0
@daily ./my_gpu_code


Thoughts?

All the best,
Chris
Comment 3 Chris Samuel (NERSC) 2020-10-30 00:21:13 MDT
Hi Tim,

Is there a way via cli_filter or the submit filter to identify these for policy application?

I can see that the submit filter does seem to be applied as without those precursor lines I get the error:


There was an issue with the job submission on lines (null)
The error code return was: Unspecified error
The error message was: Unable to determine account name.  Please resubmit your job specifying account with -A.
The failed lines are commented out with #BAD:
Do you want to retry the edit? (y/n) 


Though the line doesn't get commented out with #BAD in this situation (it does if it fails to be parsed, say if I remove the @ from in front of @daily).

All the best,
Chris
Comment 4 Chris Samuel (NERSC) 2020-10-30 00:24:28 MDT
Hi Tim,

Final thing for the night - I've seen this get reported:

csamuel@gert01:/global/gscratch1/sd/csamuel/slurm/git/src/scrontab> scrontab -e
scrontab: error: cronspec_to_bitstring: at format
scrontab: error: cronspec_to_bitstring: at format


This is for this crontab:

# min hour day-of-month month day-of-week command
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
*/5 * * * * bash -c '(date; echo ${SLURM_JOB_ID}) > /tmp/csamuel.test'
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
@hourly bash -c '(date; echo ${SLURM_JOB_ID}) >> /global/homes/c/csamuel/scrontab.hourly
#SCRON -q xfer
#SCRON -A nstaff
#SCRON --time 1
@daily bash -c '(date; echo ${SLURM_JOB_ID}) >> /global/homes/c/csamuel/scrontab.daily


Looks like it happens in cronspec_to_bitstring() in src/scrontab/parse.c:

        if (*pos == '@') {
                error("%s: at format", __func__);

I'm guessing this is some left over debugging?

All the best,
Chris
Comment 5 Tim Wickberg 2020-10-30 01:41:44 MDT
> Final thing for the night - I've seen this get reported:
> scrontab: error: cronspec_to_bitstring: at format

Fixed (7703e8ae07), that was a stray debugging line.

> There was an issue with the job submission on lines (null)
> <snip>
> Though the line doesn't get commented out with #BAD in this situation (it
> does if it fails to be parsed, say if I remove the @ from in front of
> @daily).

The (null) there was the issue. Fixed (3270b537b9).
Comment 6 Tim Wickberg 2020-10-30 02:14:29 MDT
This fixes some edge cases I found on review, please make sure further testing includes it as well:

commit fb82f284ec9995d8775562e0f0202062e8ada450
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Fri Oct 30 01:45:31 2020 -0600

    Only save updated crontab on successful submission.
    
    Also fixes an issue where a new crontab with no jobs or 'crontab -r' would
    not remove prior crontab jobs as the return path bailed out too soon.
Comment 7 Tim Wickberg 2020-10-30 02:21:40 MDT
(In reply to Chris Samuel (NERSC) from comment #2)
> Hi Tim,
> 
> Thanks so much for this! I've been testing it a little on Gerty and so far
> it works.  The only thing I've noticed is that I have to repeat the these
> definitions for every line I add:

This is intentional, and is documented in the example and in the man page:

Example:

# Lines starting with #SCRON will be parsed for options to use
# with the next cron line. E.g., "#SCRON --time 1" would request
# a one minute timelimit be applied. See the sbatch man page for
# options, although note that not all options are supported here.

Man page:
"Options are always reset in between each crontab entry."

> I was wondering if it was possible for it to remember the previous ones if
> none were specified (and forget everything if something is mentioned).

I'd considered that, but I think that'd be just as confusing in some respects.

In my envisioned use, these #SCRON lines are best avoided, and I'd expect some magic from cli_filter and/or job_submit to be doing the real heavy lifting. Thus not wanting to complicate the (already complicated) parser further.

I'm willing to reconsider this if you think that's going to be an issue, but IMO any solution is going to involve at least some opportunities for confusion.

(In reply to Chris Samuel (NERSC) from comment #3)
> Hi Tim,
> 
> Is there a way via cli_filter or the submit filter to identify these for
> policy application?
> 
> I can see that the submit filter does seem to be applied as without those
> precursor lines I get the error:

Job submit sees them all as individual submissions. And submissions stop being processed on the first error - then all the submitted jobs will be flushed out until a completely acceptable submission makes it in.

cli_filter is only setting up default options at the moment. I'd meant to enable it but have not yet.

As for identification - the cronspec field (or crontab_entry) is the best sign of these being different. Although there is no Lua representation of that field yet - I gather that'd be of interest?
Comment 8 Chris Samuel (NERSC) 2020-10-30 11:12:23 MDT
Hi Tim,

Thanks for the fixes, I've just pulled them and will rebuild. Gerty is getting upgraded to the latest patchset (PS16) today so my ability for testing might be limited.

(In reply to Tim Wickberg from comment #7)
> (In reply to Chris Samuel (NERSC) from comment #2)
> > Hi Tim,
> > 
> > Thanks so much for this! I've been testing it a little on Gerty and so far
> > it works.  The only thing I've noticed is that I have to repeat the these
> > definitions for every line I add:
> 
> This is intentional, and is documented in the example and in the man page:

Yeah, I saw that, just wondering from a usability point of view, but if (as you say later) there is a way to pick up the fact that these are scron jobs then we can deal with them there.  No biggy.

> cli_filter is only setting up default options at the moment. I'd meant to
> enable it but have not yet.

OK thanks, that would be handy.

> As for identification - the cronspec field (or crontab_entry) is the best
> sign of these being different. Although there is no Lua representation of
> that field yet - I gather that'd be of interest?

Most definitely!

Thanks so much!

All the best,
Chris
Comment 9 Chris Samuel (NERSC) 2020-10-30 12:13:01 MDT
Hi Tim,

Those fixes look good, thanks for that.

The only feature I've noticed from crontab that's missing is the ability to do "scrontab my_custom_things.cron" to read in a pre-prepared crontab from a file.

I'll do some more testing once the PS16 work is done.

All the best,
Chris
Comment 10 Tim Wickberg 2020-10-30 12:28:59 MDT
> The only feature I've noticed from crontab that's missing is the ability to
> do "scrontab my_custom_things.cron" to read in a pre-prepared crontab from a
> file.

Unless you think there's a huge demand for that, I'd rather not implement it.

In the same way that 'scrontab' defaults to editing when no options are given - rather than crontab's POSIX-required behavior of trying to read from stdid - I view that as a somewhat sharp-edged mode of operation that I'd rather not offer at this time.
Comment 11 Chris Samuel (NERSC) 2020-10-30 12:34:00 MDT
Hi Tim,

I'll check with the consultants, I'm not aware of how folks set theirs up so I'm not sure if there's any automated tooling that expects to be able to install a crontab non-interactively.

All the best,
Chris
Comment 12 Tim Wickberg 2020-10-30 12:36:53 MDT
> I'll check with the consultants, I'm not aware of how folks set theirs up so
> I'm not sure if there's any automated tooling that expects to be able to
> install a crontab non-interactively.

Sounds fine.

And such interactive shenanigans are something I don't mind blocking... unless they're very carefully written, they're likely to just blow away any existing scrontab content the user already had. And wouldn't know to use #SCRON if necessary.
Comment 13 Chris Samuel (NERSC) 2020-11-01 23:21:47 MST
Hi Tim,

(In reply to Tim Wickberg from comment #7)

> Man page:

Is it possible to add the preamble that scrontab puts into a new entry to the man page as an example please?  That might help stimulate interest in it if a new user can see the fact that it has a familiar feel to it.

Also maybe mention the ability to use "@yearly", "@annually", "@monthly", "@weekly", "@daily", "@midnight" and "hourly"?

I noticed it looks like something odd is happening with the formatting, on SLES 15 this reads oddly, as if there is missing text:

---------------------------------------------------------------
Lines must be either comments starting with entries.

Lines  starting  with  following  crontab  entry. Options are always reset in between each crontab entry.Options include most of those available to the sbatch command;
       details are available in sbatch(1).
---------------------------------------------------------------

Looking at the nroff source it looks like there is a \# in both cases and the following text from that line is getting lost, I think that might be because in troff \ can start a comment in the source.

A quick test shows that dropping the \ seems to be enough to fix that.

Also there's no space before "Options" and the preceding full stop, that looks more like it might just be a missing newline before "Options" from the nroff.

All the best,
Chris
Comment 15 Tim Wickberg 2020-11-02 14:04:22 MST
We're working on tidying up the documentation, that should happen ahead of 20.11 proper.

To summarize what else is outstanding from my point of view:

- Expose something in job_submit.lua indicating the job is from scrontab.

- Hook up cli_filter inside the scrontab command.
Comment 16 Chris Samuel (NERSC) 2020-11-02 14:47:03 MST
(In reply to Tim Wickberg from comment #15)

> We're working on tidying up the documentation, that should happen ahead of
> 20.11 proper.

No worries, thanks for that!

> To summarize what else is outstanding from my point of view:
> 
> - Expose something in job_submit.lua indicating the job is from scrontab.
> 
> - Hook up cli_filter inside the scrontab command.

Sounds about right to me.

Is there a way to sbatch in a new cron job?

All the best,
Chris
Comment 17 Tim Wickberg 2020-11-02 15:00:16 MST
> > To summarize what else is outstanding from my point of view:
> > 
> > - Expose something in job_submit.lua indicating the job is from scrontab.
> > 
> > - Hook up cli_filter inside the scrontab command.
> 
> Sounds about right to me.
> 
> Is there a way to sbatch in a new cron job?

Not at present, no.

The main issue I ran into is there'd be no easy way to represent sbatch-cron-submissions in scrontab itself, so it'd be a bit confusing to manage these. 

(I save the raw crontab which is turned into each job record, but there's no way to translate a job record back to a set of #SCRON lines without needing to lay out every single possible option.)

If you see use from having some form of recurrence available through sbatch I'm not opposed to adding it in a future release, but I can't make the RPC changes that would be necessary to enable that in 20.11 at this point.
Comment 18 Chris Samuel (NERSC) 2020-11-02 15:32:17 MST
(In reply to Tim Wickberg from comment #17)

> Not at present, no.

No worries, this was just me being curious and thinking about what we might need to document.

Much obliged!
Chris
Comment 22 Ben Roberts 2020-11-05 15:41:14 MST
*** Ticket 10167 has been marked as a duplicate of this ticket. ***
Comment 24 Gordon Dexter 2020-11-06 12:51:18 MST
Does scrontab check that a user is enabled (that is, their shell isn't /sbin/nologin or /bin/false) before running their jobs?  Or is there some similar 'locked' feature in sacctmgr that would allow us to disable/enable users cron jobs to ensure we don't have any phantom jobs doing stuff after users are gone.
Comment 25 Chris Samuel (NERSC) 2020-11-06 13:06:21 MST
(In reply to Gordon Dexter from comment #24)

> Does scrontab check that a user is enabled (that is, their shell isn't
> /sbin/nologin or /bin/false) before running their jobs?  Or is there some
> similar 'locked' feature in sacctmgr that would allow us to disable/enable
> users cron jobs to ensure we don't have any phantom jobs doing stuff after
> users are gone.

I think it's the same as any other Slurm job in this respect.

For instance if we need to disable a user (usually because their jobs are disrupting Slurm or causing other system issues) we set them to only have access to a "batchdisable" QOS which has no ability to run jobs.

I'll try and find some time today to test that in our test system.

All the best,
Chris
Comment 26 Gordon Dexter 2020-11-10 08:37:26 MST
(In reply to Chris Samuel (NERSC) from comment #11)
> Hi Tim,
> 
> I'll check with the consultants, I'm not aware of how folks set theirs up so
> I'm not sure if there's any automated tooling that expects to be able to
> install a crontab non-interactively.
> 
> All the best,
> Chris

We do a lot of user setup via automated scripts, and we try to make things so the user can hit the ground running, so it could be useful to have some way to modify an scrontab file programatically.  I agree that the default should be edit though.
Comment 27 Tim Wickberg 2020-11-11 19:36:09 MST
Okay: job_submit/lua and cli_filter adjustments are in ahead of rc2 tomorrow.

Note this required a breaking RPC change to scrontab - the rc1 scrontab will not communicate with rc2 and the final release versions of slurmctld.

- Tim

commit 843b8dbc078c0faba12d51520d505673061ed1c8
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 18:25:01 2020 -0800

    job_submit/lua - expose a "cron_job" boolean field
    
    Bug 10056.

commit 59e4105b18186479aee4079fbb7f424779116a83
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 17:06:21 2020 -0800

    scrontab - add cli_filter hooks.
    
    Bug 10056.

commit 97e9abe22568a22ce92e3272c6cb3ab4abc283a6
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 17:42:47 2020 -0800

    Populate the jobids array in crontab_update_response_msg_t.

commit 9e3aa68615fd99e751d36a20a4cbadb4f995c6e3
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 17:35:43 2020 -0800

    Send array of jobids as part of crontab_update_response_msg_t.
    
    This is a breaking RPC change for scrontab.

commit 59b3ab570f37f95d3caeb9ec217fd5c50ffb07cf
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 16:07:06 2020 -0800

    Tweak CRON_JOB flag handling.
    
    Ensure flag is always set on scrontab-submitted jobs in job_submit.

commit 1306233ae9061dd1f22928f61e95efa6d698a9e5
Author:     Tim Wickberg <tim@schedmd.com>
AuthorDate: Wed Nov 11 15:48:40 2020 -0800

    scrontab - change line parsing error handling around.
    
    Which will make it simpler to add in cli_filter support.
Comment 28 Chris Samuel (NERSC) 2020-12-01 16:00:26 MST
Thanks Tim!  I'd completely missed this update. Much appreciated!
Comment 29 Josh Samuelson 2020-12-10 07:56:14 MST
Created attachment 17093 [details]
Proposed patch

While looking over the 20.11 RELEASE_NOTES, I tend to "git grep" and look over the commits for more information.  Looking at the ScronParameters change, it appears the wrong structure member is being assigned in src/api/config_info.c:slurm_ctl_conf_2_key_pairs()?
Comment 30 Tim Wickberg 2020-12-10 08:16:16 MST
Thanks Josh. Applied ahead of 20.11.1.

In the future can you please split bug fixes off into new tickets? We generally like to avoid reopening the original development tracking tickets.

thanks,
- Tim

commit f8e7df5027ca2a540cd8b6d699a6f0b666bd7143
Author:     Josh Samuelson <josh@1up.unl.edu>
AuthorDate: Wed Dec 9 23:13:51 2020 -0600

    Assign correct value variable for ScronParameters key.
    
    Otherwise 'scontrol show config' (and other API consumers) will display
    the wrong value.
    
    Bug 10056.