Hi - We have added a partition to our cori system and would like all jobs submitted to the partition to have immediate set to 300 (5 minutes). We have a line in our job_submit.lua in slurm_job_submit job_request.immediate = 300 and then at the end of the submit function: slurm.log_info('immediate value is %f',job_request.immediate) We have verified through the logs that the job gets an immediate value of 300 (to verify that there's no error in our logic). However, we have verified that the immediate value set in job_submit.lua does not have the desired effect on the job - it is as if immediate has not been set. So a user will run salloc and sit there (past the 300 second mark). If, within salloc, we set --immediate=<whatever>, we get the expected behavior. Thanks, James
This is a bit tricky, and may be better handled by setting SALLOC_IMMEDIATE in the user environments. The immediate value is one of relatively few that directly affects salloc as well as slurmctld. There's no way in the existing RPC formats to back-propagate that change to salloc, and we probably should not have exposed it to job_submit.lua for that reason.
Hi Tim, Understood. We only want immediate set for one partition and not the others, otherwise setting SALLOC_IMMEDIATE would be fine. Or perhaps I don't understand a way to deploy this? Thanks, James
(In reply to James Botts from comment #2) > Hi Tim, > > Understood. We only want immediate set for one partition and not the > others, otherwise setting SALLOC_IMMEDIATE would be fine. Or perhaps I > don't understand a way to deploy this? > > Thanks, James Unfortunately I don't think there's a way to accomplish this at present, and would need to be handled as an enhancement request.
(In reply to Tim Wickberg from comment #3) > (In reply to James Botts from comment #2) > > Hi Tim, > > > > Understood. We only want immediate set for one partition and not the > > others, otherwise setting SALLOC_IMMEDIATE would be fine. Or perhaps I > > don't understand a way to deploy this? > > > > Thanks, James > > Unfortunately I don't think there's a way to accomplish this at present, and > would need to be handled as an enhancement request. Just wanted to check if you'd be okay with reclassifying this as an enhancement request - right now there's no way to inform the srun command that this value has been overridden on that single partition. It's one of the few settings that has an affect on the client, not just the values presented to slurmctld. - Tim
We've temporarily hacked our salloc wrappers to set SALLOC_IMMEDIATE in the cases we care about. Now that I understand better what immediate is doing, I can understand why it can't easily be set in job_submit plugins. That said, I think there is still a bug here. We cannot _read_ immediate in job_submit.lua properly (which we want to do to ensure people aren't messing with it to form a queue we do not want to exist). So I think that is a bug. In terms of an enhancement, I think a really good enhancement would be a new class of plugin, let's call it cli_filter, that is run on the client side by sbatch/salloc/srun, and allows the site to manipulate the specific arguments _before_ they are sent off. The potential benefits are being able to do more complex processing on the client side instead of on the slurmctld side -- the problem with job_submit.lua is that is _must_ be fast because slurmctld locks are held while it is running. On the client side, we could check filesystem quotas, snoop on the specified options (and log them), do all kinds of great things (also manipulate immediate, the point of this bug). The only real technical barrier I see is that all of salloc/srun/sbatch use different opt.h files with differnt opt_t structures, and prevent upstream library code from importing them by all using the same #define guard. If this were restructured in a minor way then it might be possible to allow a cli_filter/lua plugin do similar things as job_submit. I may take a stab at this in my free time (ha).
(In reply to Doug Jacobsen from comment #5) > We've temporarily hacked our salloc wrappers to set SALLOC_IMMEDIATE in the > cases we care about. Now that I understand better what immediate is doing, > I can understand why it can't easily be set in job_submit plugins. > > That said, I think there is still a bug here. We cannot _read_ immediate in > job_submit.lua properly (which we want to do to ensure people aren't messing > with it to form a queue we do not want to exist). So I think that is a bug. > > > In terms of an enhancement, I think a really good enhancement would be a new > class of plugin, let's call it cli_filter, that is run on the client side by > sbatch/salloc/srun, and allows the site to manipulate the specific arguments > _before_ they are sent off. > > The potential benefits are being able to do more complex processing on the > client side instead of on the slurmctld side -- the problem with > job_submit.lua is that is _must_ be fast because slurmctld locks are held > while it is running. On the client side, we could check filesystem quotas, > snoop on the specified options (and log them), do all kinds of great things > (also manipulate immediate, the point of this bug). I'm looking into ways to mitigate some of the lock issues for 17.11, but yes, that is definitely a problem. One of the main issues with client-side processing of this nature is that it's optional, and I'd be hesitant to add functionality that may be misinterpreted as a client-side security mechanism. I assume that's not a huge concern for you though, and presumably all sensitive work would still be managed through slurmctld? > The only real technical barrier I see is that all of salloc/srun/sbatch use > different opt.h files with differnt opt_t structures, and prevent upstream > library code from importing them by all using the same #define guard. If > this were restructured in a minor way then it might be possible to allow a > cli_filter/lua plugin do similar things as job_submit. I may take a stab at > this in my free time (ha).
Reclassifying as Sev5. As a summary: setting the "immediate" value through a job_submit plugin does not update the client side, and that flag is one of (possibly the only?) that does alter the behavior of the client command as well. For this to work, the change to "immediate" would need to be pushed back to salloc/srun somehow. As a different option, some way to run client-side job submission may be a suitable alternative, with the understanding that the client could still potentially override any of that logic.
Hello, I'm bumping the severity up, mostly to ensure this update is noticed, feel free to adjust as required. I've got what I think is a workable model client-side job submission filter, though this is still very much prototype code, and not all features are implemented that I will need for my planned usage. But I wanted to get this on your radar so we can discuss the design and implementation before it goes much further, to ensure that the idea and methodology are in line with something that could be mainlined into slurm. Since I've been tracking the master branch (slurm 17.11) with this, I presently have the code in the cli_filter-17.11 branch of my personal fork of slurm on github. Please see https://github.com/dmjacobsen/slurm/tree/cli_filter-17.11 The cli_filter plugin is intended to provide three hooks that are executed at different stages of salloc/sbatch/srun execution. cli_filter plugins are exclusively run as the user and only from the context of the cli application. The three hooks are: setup_defaults --- run early prior to any processing of the environment or options. This allows the site to adjust defaults without manipulating the user environment or writing wrapper scripts. One could also imagine a cli_filter/user_defaults plugin that allowed a user to create a $HOME/.slurm_defaults file to layer in their own defaults pre_submit --- run after all option and environment processing and immediately before submission to slurmctld. This allows the site to manipulate options, print text to user terminal (since it has full control of user stdout/stderr) even in cases that are accepted (unlike user_log options in job_submit), screen out submissions that are contrary to site policy prior to any interaction with slurmcltd. With some further work, one could imagine interactions with the slurmdb to validate accounts without requiring communication with slurmctld. post_submit --- run after slurmctld response with jobid returned. I plan to use this to serialize the opt data structure and log it. This will be used to monitor user behavior while piercing through all possible mechanisms for setting options (cli_filter/setup_defaults stacking, environment setting, explicit option setup) To achieve this all of the salloc/sbatch/srun opt_t data structures were renamed appropriately to sallaoc_opt_t, sbatch_opt_t, srun_opt_t, and moved from opt.h to [salloc|sbatch|srun]_opt.h to allow just the data structures to be addressed by the common/cli_filter code as well as the plugins directly. A generalized lua plugin has been included which allows reading of all fields (I think) and manipulation of most. This lua plugin breaks somewhat from the design of the job_submit_lua plugin, in that read/write of the opt data structures is done through a generic interface rather than the if/else conditional structure used in job_submit_lua. This has the advantage of being parsimonious for the developer while providing complete flexibility, but does add some gotchas if the types in the data structures change. A very simple example cli_filter.lua script is included here, just to give you an idea of what I'm thinking: function slurm_cli_pre_submit(cli, opts) if cli == slurm.CLI_SALLOC and opts['qos'] == "interactive" then slurm.log_info("NOTICE: setting immediate value for interactive session") opts['immediate'] = 5 end local reqpart = opts['partition'] if reqpart ~= nil and string.find(reqpart, ",") ~= nil then slurm.log_error("Specifying multiple partitions is not allowed on this system") return slurm.ERROR end return slurm.SUCCESS end function slurm_cli_setup_defaults(cli, opts) opts['time_limit_str'] = '57' return slurm.SUCCESS end If the basic concept is accepted I plan to marry the cli_filter.lua and job_submit.lua for the NERSC site with a common included lua library so similar logic is enforced from both contexts. This will ensure that even in the case that a user attempts to use libslurm to submit a job, or uses scontrol update, all policies are enforced. Will need to extend cli_filter_lua further to get partition, environment, and some slurm accounting checks to get all the functionality I require. I do plan to exclusively run some _very_ heavyweight checks in the cli_filter, which under some bad conditions could end up in the D-state (i.e., parallel filesystem quota checks). By running these client side, I avoid the risk of slurmctld getting jammed up when one of our many lustre filesystems is unavailable. Thanks, Doug
I'm asking Moe to take a look at it; architectural work like this is his domain. Just one comment on the repo - internally we always rebase our functional branches, never merge. That helps isolate a series of changes that are all related to the functionality under development, rather than ending with them scattered throughout the branch.
Doug - Are you able to get this rebased cleanly to master? You branch is merging in changes, which we never do in our repo. It's leaving your changes scattered throughout the commit history, and hard to separate out. While taking a quick look at it, it looks like you're unfortunately colliding with some changes Moe's been making to support Heterogeneous Jobs. I don't think that should pose an insurmountable obstacle, but does require some care to get things sorted out and we're unfortunately running a bit short on time. BTW - code freeze for 17.11 is coming up on September 21st, and we'll need external contributions finalized ahead of that to have time to review. - Tim
OK. I wasn't planning to have the commits directly added to the repo. I'll just consolidate the commits into a single one. I was mostly interested in finding out if this approach was of interest to SchedMD, and if there are suggestions for how it should be changed to make it more acceptable if it isn't (other than the rebasing, I'm happy to fix that up). ---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> dmjacobsen@lbl.gov ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________ On Tue, Sep 5, 2017 at 11:36 AM, <bugs@schedmd.com> wrote: > Tim Wickberg <tim@schedmd.com> changed bug 3745 > <https://bugs.schedmd.com/show_bug.cgi?id=3745> > What Removed Added > Assignee jette@schedmd.com tim@schedmd.com > > *Comment # 13 <https://bugs.schedmd.com/show_bug.cgi?id=3745#c13> on bug > 3745 <https://bugs.schedmd.com/show_bug.cgi?id=3745> from Tim Wickberg > <tim@schedmd.com> * > > Doug - > > Are you able to get this rebased cleanly to master? > > You branch is merging in changes, which we never do in our repo. It's leaving > your changes scattered throughout the commit history, and hard to separate out. > > While taking a quick look at it, it looks like you're unfortunately colliding > with some changes Moe's been making to support Heterogeneous Jobs. I don't > think that should pose an insurmountable obstacle, but does require some care > to get things sorted out and we're unfortunately running a bit short on time. > > BTW - code freeze for 17.11 is coming up on September 21st, and we'll need > external contributions finalized ahead of that to have time to review. > > - Tim > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
(In reply to Doug Jacobsen from comment #14) > OK. I wasn't planning to have the commits directly added to the repo. > I'll just consolidate the commits into a single one. I was mostly > interested in finding out if this approach was of interest to SchedMD, and > if there are suggestions for how it should be changed to make it more > acceptable if it isn't (other than the rebasing, I'm happy to fix that up). Any luck consolidating this so far? The free for 17.11 is coming up pretty soon, and we'll need at least a little bit of time to review.
I'll finish it off this weekend On Sep 15, 2017 13:27, <bugs@schedmd.com> wrote: > *Comment # 15 <https://bugs.schedmd.com/show_bug.cgi?id=3745#c15> on bug > 3745 <https://bugs.schedmd.com/show_bug.cgi?id=3745> from Tim Wickberg > <tim@schedmd.com> * > > (In reply to Doug Jacobsen from comment #14 <https://bugs.schedmd.com/show_bug.cgi?id=3745#c14>)> OK. I wasn't planning to have the commits directly added to the repo. > > I'll just consolidate the commits into a single one. I was mostly > > interested in finding out if this approach was of interest to SchedMD, and > > if there are suggestions for how it should be changed to make it more > > acceptable if it isn't (other than the rebasing, I'm happy to fix that up). > > Any luck consolidating this so far? The free for 17.11 is coming up pretty > soon, and we'll need at least a little bit of time to review. > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
Ok, everything is untangled. https://github.com/SchedMD/slurm/compare/master...dmjacobsen:cli_filter-17.11?expand=1 But now I need to make sure it still works =) I'll also drive forward on some of the polish, making the assumption it will merge given how close we are to the feature freeze.
That looks a lot nicer, thanks for clearing that up. Adding Danny on here as he may have some time to review before I can. The one thing I'm noticing is that you're forcing the cli_filter plugins to know how to parse the sbatch_opt_t vs srun_opt_t vs salloc_opt_t which seems like it'd cause a lot of frustration. Is there a reason you couldn't have had it operate on the job_desc_t instead?
> That looks a lot nicer, thanks for clearing that up. Adding Danny on here as he may > have some time to review before I can. No trouble - sorry that I had the slurm-git-flow off earlier! > The one thing I'm noticing is that you're forcing the cli_filter plugins to know how > to parse the sbatch_opt_t vs srun_opt_t vs salloc_opt_t which seems like it'd cause a > lot of frustration. In the lua plugin I've provided a mostly generic interface to each field of the structs, though I certainly haven't tested each one. I'm adding a similar string-based interface to src/common/cli_filter.c that all cli_filter plugins can use. That way there is a strings-only interface for getting data in and out, and will be able to generically look up supported fields. This will also make it possible to generate a json-formatted serialized version of the data structures -- one of the use cases I have is to log user activities. > Is there a reason you couldn't have had it operate on the job_desc_t instead? Yes, the cli programs don't all use the contents of job_desc_msg_t solely to drive behavior, in particular, the salloc immediate capability. However, I'm thinking of extending this further to sbcast, and perhaps others in the future. Also, by focusing on the *_opt_t the setup_defaults capability (runs before option processing), is able to use the same input formats that users would, as opposed to post-processed formats (e.g., 10:00:00 or 600 for 10 hours, instead of just 600 minutes in an integer).
Just FYI - there are a few build warnings currently showing on our TravisCI setup. I just rebased to master again and fixed the immediate errors, looks like there's some quirkiness in some function signatures though. https://travis-ci.org/SchedMD/slurm/jobs/276640206
Hello, I've been making a number of updates to the cli_filter branch of my slurm fork (up to date relative to schedmd/master as of 3am this morning). I'm now working on fixing up the plugins (not just the plugin infrastructure), especially, lua, user_default, and logging. https://github.com/SchedMD/slurm/compare/master...dmjacobsen:cli_filter?expand=1 The major change I've put in is that I've removed all the committed hard-coded data structures for the lua support for entry into the opt data structures. I've also added a generic string-based interface that all cli_filter plugins can use to aid in get/set operations. This is enabled by generating the mappings live at compile time by introspecting the cpp pre-proprocessed header files and then generating the arrays for the mapping dynamically. This should remove the major maintenance hurdle I was worried about with this approach, though now requires some python code to run at compile time. If there is concern about running python at compile time, we could just include the script to manually run when the *opt_t data structures change and commit the generated code. Now I'm working on finishing off a simple implementation of the user-defaults plugin, adding a logging example (just using syslog for the moment, but elastic or ldms backends will be my preference for the future, and re-validating that the lua plugin functions after i replaced all the mappings with the auto-generated ones. Thanks, Doug
Note that my changes require ./autogen.sh again
Hello, Just checking in. Latest changes are here: https://github.com/SchedMD/slurm/compare/master...dmjacobsen:cli_filter?expand=1 should be up-to-date relative to schedmd/master as of 18:13 today. Tonight I'm going to wrap things up. Basically all I have left is to: 1) expose a cli_filter/common function in the lua-space (json rep of the opt structure) 2) deal with cli_err_msg in all the clients 3) exit the cli in the case of non-SLURM_SUCCESS exit status in post_submit 4) actually have the syslog plugin syslog instead of printf 5) add functions for the cpu_bind and mem_bind data structures (just string versions for output) But I think all the core bits are in place. And really at this point I'm at detail work. Testing it all together it seems quite functional -- easy to set defualts, manipulate defaults on the client side. Provide verbose warnings, even in case that submission is allowed, log everything. I'll stop work after tonight, but I don't expect any further major changes other than the detail work. -Doug
Hello, OK, I think I'm done with the basic code for this, including the detail work. It is current with schedmd/master as of 13:18 Pacific today. It is on the cli_filter branch of the dmjacobsen/slurm fork. https://github.com/SchedMD/slurm/compare/master...dmjacobsen:cli_filter?expand=1 I completed the work that I mentioned in my last update. In addition, I realized for job packs the post_submit hook needed to be called for each opt set. This meant that in the included commits I modified sbatch and salloc to store each set of opts in an opt_list similar to srun. In fact I directly stole the code from srun. I'm going to start backporting this to 17.02 (should be trivial with the code generator), and my original pre-job-pack hook positions should similarly work. Another autogen.sh run is required (my sles dev space makes more changes than I am comfortable with). But unless there are specific requests, or if I find problematic bugs as I move this off from VMs and onto gerty (then cori), my expectation is that the base work is more or less complete. I'd be happy to provide an initial stab at some documentation for the feature and some example lua plugins for contrib, but I wanted to get the base code fully done before the feature freeze for 17.11 and so have stayed focused on that. Thanks so much, Doug
It's going to take a little bit of surgery to get this cleaned up again. Your changes involving opt_t are conflicting with a lot of Moe's ongoing hetjobs work, and that takes precedence at the moment. One thing I'm not sure if you'd explored, but I think may help solve some of this would be if we were able to unify the three different opt_t structures. I suspect, but haven't looked, that 95% of the fields are in common across them. Did you look into this as part of the work you've done by any chance? I think doing that would help cleanup the cli_filter API - requiring the plugins themselves to handle the three different struct types seems awkward. Re: autogenerating, I don't think we're willing to introduce a compile-time dependency on python. There's no code generation currently done elsewhere in Slurm, and I think we'll need to have a discussion around how best to handle that. Obviously, I can see that there are some major maintenance advantages to that approach though. Maybe if it was hooked up as part of our autoconf pre-processing we could live with that - then it's not a compile time requirement at least.
Hi Tim, I've been tracking master and continuously performing that opt_t cleanup on my cli_filter branch (and reflecting changes Moe has been making in the other opt_lists I needed to build to support job_packs within cli_filter). I think a force push from dmjacobsen/slurm/cli_filter onto schedmd/slurm/cli_filter may be the bulk of what is required. Regarding auto-generation, yes, I think if it it could be wired up to be done at autoreconf time, that might be better than compile time -- certainly easier to figure out what is going on, and adds fewer requirements to build sites. I mostly wanted to wire up _something_ do deal with handling changes to those data structures, and not allow each cli_filter plugin to became a maintenance chore. Regarding combining the opt data structures, I did consider this, however, I was concerned that it would prove to be a bigger change than you would want. At present all I've done is rename the data structure and inject a few hooks into the cli, rather than redesign them. I think that combining the data structures is the right path for the future, and could greatly expand the usefulness of the client-side job editing capabilities of the cli_filter. My general thought would be (not for this initial implementation), is that both the data structure and the software interface for handling CLI application interfaces should be generalized and moved into libslurm_full. This could then allow the cli_filter_pre_submit(), for example, to edit the data structure, and then cause the same post-processing to occur again to ensure that the opt data structure is returned to a consistent state. At present, I'm using the same philosophy as the job_submit, which is that you can directly edit the data structure, just be careful! Regarding a cli_filter plugin needing to support each structure individually, this is not technically required. The generic strings interface converts most every field in each structure (thanks to the autogenerator) into a string. The cli_filter can then indirectly access them that way. As you point out, most of the structures are identical, so this means that in most cases a sufficiently limited scope plugin would not get tediously difficult to develop. Thanks, Doug
just rebased from master again and pushed to dmjacobsen/slurm/cli_filter. we'll see what travis has to say.
Hmm... there are a number of subtle differences, there appear to be seven options to sbatch that salloc does not have, and 27 the other way. srun's a much bigger difference - 100 extra values beyond salloc. I'm now thinking it'd make sense to create a single slurm_opt_t with the in-common arguments in it (~100 of them), and then include separate salloc_opt_t / sbatch_opt_t / srun_opt_t for the subset of options that only make sense in one context or another. If those were embedded into the slurm_opt_t (and NULL if not applicable) the cli_filter plugin would only need to manage a single struct. If that sounds of use, I can bring this up internally. We're cutting it pretty close on timing before the freeze though, and Danny's already out travelling for SLUG...
Hmmm, well, in any case I'd need to get them all, common or not. One of my goals is to log every sbatch/salloc/srun with every option that was used (implicit or not). Using the common data structures is useful for the job manipulation context, but doesn't capture the rest. ---- Doug Jacobsen, Ph.D. NERSC Computer Systems Engineer National Energy Research Scientific Computing Center <http://www.nersc.gov> dmjacobsen@lbl.gov ------------- __o ---------- _ '\<,_ ----------(_)/ (_)__________________________ On Thu, Sep 21, 2017 at 3:53 PM, <bugs@schedmd.com> wrote: > *Comment # 29 <https://bugs.schedmd.com/show_bug.cgi?id=3745#c29> on bug > 3745 <https://bugs.schedmd.com/show_bug.cgi?id=3745> from Tim Wickberg > <tim@schedmd.com> * > > Hmm... there are a number of subtle differences, there appear to be seven > options to sbatch that salloc does not have, and 27 the other way. > > srun's a much bigger difference - 100 extra values beyond salloc. > > I'm now thinking it'd make sense to create a single slurm_opt_t with the > in-common arguments in it (~100 of them), and then include separate > salloc_opt_t / sbatch_opt_t / srun_opt_t for the subset of options that only > make sense in one context or another. If those were embedded into the > slurm_opt_t (and NULL if not applicable) the cli_filter plugin would only need > to manage a single struct. > > If that sounds of use, I can bring this up internally. We're cutting it pretty > close on timing before the freeze though, and Danny's already out travelling > for SLUG... > > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
(In reply to Doug Jacobsen from comment #31) > Hmmm, well, in any case I'd need to get them all, common or not. One of my > goals is to log every sbatch/salloc/srun with every option that was used > (implicit or not). > > Using the common data structures is useful for the job manipulation > context, but doesn't capture the rest. The srun command has a lot of options that are job-step specific. Not that I want to get into this right now, but the subject of a step submit plugin along the same lines of a job submit plugin has been raised a couple of times in the past. The options for salloc and sbatch should have about 95% overlap. If we did create a common data structure, it would obviously need to be a super-set of job options.
the dmjacobsen/slurm/cli_filter branch is rebased off from schedmd/slurm/master again (up-to-date as of 09:30 Pacific).
rebased again off from schedmd/slurm/master, merged in changes for --x11 and related changes. also added a couple of modifications as 17.02 testing on gerty revealed some defects
Created attachment 5268 [details] possible option merge This is what I was thinking of doing. I figured out how to make the change pretty trivial to implement; diff, sed, and some careful rearrangement make this pretty easy. srun is a bigger chunk of work, as there as a lot of step arguments that don't exist in salloc/sbatch. But I don't think it's impossible to get this done. Doug - the only downside is it's yet another major conflict on rebase. But, if we go with this, I think long-term it's an easier approach to manage.
Hi Tim, I think that could work. However, I would suggest that it would still be optimal to build a generic interface for getting/setting these options, rather than focusing on the storage per se (though I think one implies the other for design reasons). For example, the cli_filter/user_defaults plugin reads $HOME/.slurm_defaults and allows the user to set options after _opt_default() is applied but before the #SBATCH and env, and other options are applied. It accepts a ((command:)?(cluster:)?)?component=value syntax and sets the corresponding component of the data structure conditionally for the user. E.g., contents of example $HOME/.slurm_defaults edison:contraints=ivybridge cori:constraints=haswell partition=regular sbatch:*:qos=premium salloc:cori:qos=interactive time_limit=45 For example if the user wants to set ivybridge on edison, haswell on cori, usually use the regular partition, and since this user is apparently always flush with allocation, uses premium for batch jobs, and always uses interactive for salloc sessions on cori. Because of the way options are managed "time_limit" needs to be set, not "time". Similarly, "tasks_per_node" would be set, not "tasks-per-node", and so on. It would be nice if the cli_filter plugins could essentially call a set routine for each getopt_long name and that routine was also by getopt. This would ensure consistent naming from the user and plugin developer's perspective, and the hook would ensure that Slurm's option processing logic was followed, rather than some options (especially those managed by enumerated types like cpu_bind_type). Another potential path might be to expand on the code generation idea (not necessarily at compile time, rather at modification time), but use it to create the getopt_long structures, the environment import structures, associate the get/set functions, and so on. This might help to simplify, and unify the interfaces for the CLI applications without manual maintenance of all the individual components. In this scheme the consolidation (and specialization) of the CLI would be in the metadata from which the code is generated, not necessarily in the structures directly accessed by the CLI applications. One of my goals is to still expose all of the values, which is simplified to some degree by having fully detailed, but all separate structures that I can introspect. Exposing the values, does not imply that all need to modifiable (e.g., it makes no sense to modify argv -- but it makes a great deal of sense to parse it and potentially perform some additional tests on the apparent executable. Is it dynamically linked? Is it in a specific list where we want to imply a particular wc key?) Thank you for your consideration of this plugin infrastructure -- and yes I'm happy to continue working through these issues rather than necessarily rush it in. I am planning to deploy a version of this on cori and edison following the October facility maintenance, though probably without the user-facing component.
I readily admit I haven't had a chance to look into the implementation and plugins in detail yet - my main concern has been minimizing the constant merge conflicts. I'm looking forward to your presentation Monday to get more details on the end goal. :) This also comes from realizing how repetitive it is to add new elements - the x11 plug-in for instance has identical code in all three. Unifying the rest of the processing code does look like a good end goal, but is more than we can get done in the remaining time in this release. Getting the structs sorted out at least I think I can manage. I'll probably get an equivalent patch for srun done tomorrow on the flight.
> > I readily admit I haven't had a chance to look into the implementation and > plugins in detail yet > > I think the generated interfaces mostly address your concerns about plugins accessing the different cli opt data structures. It is certainly no worse than a job_submit plugin trying to deal with both the job_request and job_description data structures side by side. But without the maintenance hassle of a user requesting a field that hasn't been implemented yet. > - my main concern has been minimizing the constant merge > conflicts. > > Are you referring to merge conflicts in merging the cli_filter branch? Or to conflicts generated on currently outstanding features following the merge of the cli_filter branch? How frequently do the outstanding branches rebase against master? Is it viable to simply wait to merge cli_filter until after the other outstanding branches have had an opportunity to merge into master? > I'm looking forward to your presentation Monday to get more details > on the end goal. :) > > I'll try to make it exciting! I'll have a demo of a single lua file providing service for both the cli_filter and job_submit -- largely enabled by the naming and type conformance across a variety of data structures. > This also comes from realizing how repetitive it is to add new elements - the > x11 plug-in for instance has identical code in all three. > > Yes, that does seem painful -- certainly rebasing following these changes has been painful for me (re-copying the data structures from opt.h to <command>_opt.h. However, the code generator automatically picked up the new options so cli_filter already supports them. > Unifying the rest of the processing code does look like a good end goal, but is > more than we can get done in the remaining time in this release. > > I agree - suspect that this aspect could wait for next release. The only impact on cli_filter would be that we would want to hold off on user-facing deployments of, for example, the user_defaults plugin, since the interfaces for users might change. > Getting the structs sorted out at least I think I can manage. I'll probably get > an equivalent patch for srun done tomorrow on the flight. > > OK. It would make my life very much simpler if there was an: sbatch_opt.h salloc_opt.h srun_opt.h (I know you're not working on this one) slurm_opt.h That all _only_ define the structure and relevant typedef -- no extern variable or other prototypes. > ------------------------------ > You are receiving this mail because: > > - You are on the CC list for the bug. > >
Hi there, thanks for your email. * I am currently away travelling for the Slurm User Group * at NERSC in Berkeley California and will not be back in * Melbourne until the 3rd October. * * I will only have sporadic access to email and so please * do not count on getting a reply from me until after I * have returned to Australia. I now work part time at Melbourne Bioinformatics (MB, formerly known as VLSCI). Currently I work Monday, Wednesday and Thursday. If your email is about the MB supercomputers then can you please resend it to the VLSCI helpdesk at: help@vlsci.org.au For requests to join the Beowulf list please wait for my response. For other aspects of MB please see our website for details: https://www.melbournebioinformatics.org.au/contact-us/ Otherwise I will attend to it on my return. All the best, Chris -- Christopher Samuel Senior Systems Administrator Melbourne Bioinformatics - The University of Melbourne Email: samuel@unimelb.edu.au Phone: +61 (0)3 903 55545
One thing that came up in the cli_filter presentation yesterday (from several sources, including the slides) was a mechanism for trusting client-side checks. I think a workable model for this is to _never_ trust the cli_filter pre_submit() plugins directly, but specific munge messages encoded as SlurmUser for specific checks. I.e., if the site creates a setuid-slurmUser executable to perform the check, which then provides the cli_filter pre_submit() routine with a munge encoded message (from slurmUser). Then that message can be decoded by job_submit and used to validate the check on the server side. This opens up the possibility that `scontrol update` also needs to potentially run a form of pre_submit(). Though, having a capability for uid 0 to not have the checks run for the intended user may be desirable (I think most of us allow uid 0 to skip the modify restrictions). The advantage of all this is that it can obviate the need for sites to run caching services for these heavyweight checks -- like NERSC does (and has for 2 years) with redis and CSCS does with memcached. So, cli_filter/lua pre_submit() performs an io.popen to run setuid-slurmuser executable exampleCheck. exampleCheck passes and so exits with status 0 and prints a munge-encoded message to stdout. pre_submit() reads it and shoves it into the comment field (for now). Comment field format: exampleCheck=MUNGE:<message> externalBanking=MUNGE:<message> and so on. Then in order to bypass the "exampleCheck" test on the server side, the munge message needs to be decoded and validated as (1) being from slurmUser, (2) indicating the proper response for a positive completion of the check, and (3) decodes without error (within munge timeout and unique). It might be worth coding up a test of this to see where support from cli_filter and job_submit would be beneficial. I would imagine exposing functions to ease validating and decoding the munge message within job_submit/lua would be the chief among these. Even better would be some primitive in the protocol for sending provided auxiliary credentials.
(In reply to Doug Jacobsen from comment #43) > One thing that came up in the cli_filter presentation yesterday (from > several sources, including the slides) was a mechanism for trusting > client-side checks. > > I think a workable model for this is to _never_ trust the cli_filter > pre_submit() plugins directly, but specific munge messages encoded as > SlurmUser for specific checks. That's where I figured you were going eventually... it's a really interesting concept. Structurally, it gets a bit awkward on the RPC end in that you'd need to effectively pass the job_desc message in twice, once as the normal RPC, and once embedded in a munge credential. It's trivial for us to add an extra RPC element to do this, but... I'm now thinking it may be easier to just add a separate inbound RPC type that bypasses the job_submit, unpacks the signed values, and builds the job record off that. You'd have the normal path somewhere else to fall back on - or just set your job_submit plugin to deny anything that's not using your fancy pre-processor. Then you're not sending doubled-up values for everything across. The only flexibility you'd loose is to process some elements through your client-side scripting and some in the job_submit plugin, but it sounds like your end goal is to avoid job_submit entirely anyways? > I.e., if the site creates a setuid-slurmUser executable to perform the > check, which then provides the cli_filter pre_submit() routine with a munge > encoded message (from slurmUser). Then that message can be decoded by > job_submit and used to validate the check on the server side. > > This opens up the possibility that `scontrol update` also needs to > potentially run a form of pre_submit(). Though, having a capability for uid > 0 to not have the checks run for the intended user may be desirable (I think > most of us allow uid 0 to skip the modify restrictions). uid 0 / SlurmUser should be treated identically, and are usually exempt from any security measures. Yes - you'd need something akin to the job_modify() in job_submit. > The advantage of all this is that it can obviate the need for sites to run > caching services for these heavyweight checks -- like NERSC does (and has > for 2 years) with redis and CSCS does with memcached. Yeah, I hadn't appreciated how involved your job_submit plugins are, and that you've dropped caches in front to alleviate performance concerns.
To recap some discussion with Doug last week - - I plan to merge the salloc/sbatch/srun options fields before 17.11. This minimizes the patch maintenance hassle for NERSC or anyone interested in testing this out ahead of it appearing in the stable releases, and makes the eventual merge simpler. - I'll add an extra (char *) field into _pack_job_desc_msg RPC. It's anticipating the need to pass signed data from cli_filter -> job_submit, without overload the comment field. - We'll defer any discussion on merging cli_filter into master until 17.11 has been branched, and give Doug some more time to get this stabilized and tested further. - I understand that NERSC intends to run this as a custom patch on 17.11. Hence my desire to minimize the noise in the patches. I can live with this on the support side - anything that cli_filter is doing it's already technically possible to submit against the raw RPC, even if the field can't be set readily by salloc/sbatch/srun today. Once the first two items are accomplished I will move this back to Sev5 for the meantime.
Commit aed1d6b7dc8651bf839d unifies the salloc, sbatch, and srun opt_t structures into slurm_opt_t, plus separate salloc_opt_t, sbatch_opt_t, and srun_opt_t fields for options specific to one command or another.
Commit 22d1ff1c32ec adds the 'extra' field to the job_desc_msg_t / slurm_opt_t / struct job_details so you can send data between cli_filter and job_submit. And I even remembered to add it into the job_submit.lua plugin interface. Re-tagging as Sev5, as I'm done with landing parts before 17.11. We'll review this further before the 18.08 release once you've had a chance to update your code and test it out more extensively. - Tim
Hi, I am sorry to leave comments in others case. But I really like the idea of cli_filter. Is there any plans to merge it into any future version of Slurm?
Also, pinging in - we are using a version of cli_filter but I was curious to know how compatible the existing plugin is with 18 (as we are using it with 17.11). Thsnks, Kevin
Hi Tim, We have the NRE in place to get the refactoring of the getopt() functionality needed to support the _real_ cli_filter. Do you have a timeline for when the updated API will be available so I can start coding against it? Is there an opportunity to do a partial set of updates in master for just a few options so we can start coding against that? Thanks, Doug
> We have the NRE in place to get the refactoring of the getopt() > functionality needed to support the _real_ cli_filter. Do you have a > timeline for when the updated API will be available so I can start coding > against it? Is there an opportunity to do a partial set of updates in > master for just a few options so we can start coding against that? I don't know that a partial set helps you that much? The conversion is pretty much all-or-nothing from what I can tell from picking apart your POC branch... there's not a lot of value to me from only converting a handful of the flags. I do expect to tackle this before I head off to Australia on March 23rd, I'll update this as I start landing some patches. - Tim
Hi Tim, A partial set would not help with the overall functionality needed to get into the release but in terms of getting buildable, testable code integrated into a current master branch would have quite a bit of utility. However, getting non-buildable code integrated could be done with just a description of the API. I'm guessing the 19.05 feature code freeze will on or before April 1, so I'd like to get the base features to you some time in March so that they can be dialed in during the code freeze period and released with Slurm 19.05. If we can at least come up with the technical descriptions so I know what to expect in the code that will help. The key things I'll need up front are: 1) a data structure for each CLI's arguments with pointers to get/set functions and the common "name" of the option using a centralized type exposed in a common place (e.g., common/slurm_opt.[ch], https://github.com/dmjacobsen/slurm/blob/3d85432c85b7b1bb7b346000f7331a6bc5b46ab2/src/common/slurm_opt.c#L2688); This will be used to iterate over options within the cli_filter to identify, read, and write option values out. Also the expected signature for any given get()/set() function. 2) The expected signature of the finalize() API call that the cli_filter can use re-cross-validate/adjust values after a cli_filter plugin makes adjustments Even if we only partially populate the data structure from (1) and implement a few of the functions, it should allow cli_filter development to proceed in parallel with the full conversion of all the other options. Thanks, Doug
Created attachment 9740 [details] cli_filter patches Hello, Please find attached the current set of cli_filter patches. The apply against the clirefactor branch at present (b4c0a97b2e: Move spank_process_option() inside slurm_process_option(). as of 2019/03/28). I'll continue to update it as the clirefactor branch is updated (and hopefully merged soon), but I wanted to make sure this was in your hands. Thanks, Doug
The clirefactor branch has been merged into master now. This work moves all option parsing for salloc/sbatch/srun into a central set of functions designed to work with Doug's current design for the cli_filter plugin interface. While it is a significant overhaul (diffstat has 4684 insertions(+), 5149 deletions(-) on the merge), functionally it should be equivalent to the existing CLI code. Doug - if you don't mind pushing a rebased set of cli_filter patches on your repo, I will see what I can do to get those merged in next week or so before the 19.05 feature freeze. - Tim
Created attachment 9908 [details] updated cli_filter patches Hello, Please find attached the cli_filter patches relative to the master branch as of this morning. I presently have a regression test suite running on gerty (Cray XC) up through patch 21 (missed 22...) though without any cli_filter plugins configured. -Doug
Created attachment 9910 [details] regression failure logs Testsuite ran for 118 minutes 15 seconds Completions : 535 Failures : 2 Failed tests : 1.91, 2.23 Based on the clirefactor regression test, it looks like the master branch has improved, and these failures replicated (there were four failures on the clirefactor branch on 4/12)
Good Morning, The basic infrastructure underpinning the cli_filter was merged to master last night (woo-hoo! Thanks, Tim). I'm rebuilding the cli_filter branch with just the plugins (since those won't be merged for 19.05, though now they can be added with a straightforward patch). The only barrier I have presently to building the plugins is that this is needed: diff --git a/src/common/plugstack.c b/src/common/plugstack.c index ef8e6ca972..488f026080 100644 --- a/src/common/plugstack.c +++ b/src/common/plugstack.c @@ -115,8 +115,6 @@ struct spank_plugin { * SPANK Plugin options */ -#define SPANK_OPTION_ENV_PREFIX "_SLURM_SPANK_OPTION_" - struct spank_plugin_opt { struct spank_option *opt; /* Copy of plugin option info */ struct spank_plugin *plugin;/* Link back to plugin structure */ diff --git a/src/common/plugstack.h b/src/common/plugstack.h index a2b35e584c..11876c14dc 100644 --- a/src/common/plugstack.h +++ b/src/common/plugstack.h @@ -47,6 +47,8 @@ #include "src/common/job_options.h" #include "src/slurmd/slurmstepd/slurmstepd_job.h" +#define SPANK_OPTION_ENV_PREFIX "_SLURM_SPANK_OPTION_" + struct spank_launcher_job_info { uid_t uid; gid_t gid; Having SPANK_OPTION_ENV_PREFIX defined in the plugstack header file just gives cli_filter a non-hard-coded way to screen out environment options it can see more directly via the plugstack interface.
Thanks Doug. Based on your feedback, I am going to go ahead and mark this as resolved. As we've discussed, the 19.05 release will ship with the interface, but SchedMD is not prepared to ship and support any plugins at this time. That should come in the 20.02 release once we've settled some teething issues, and had more time to work through what those plugins require. If you could please open separate bugs against 20.02 in a couple of months once you've gotten some experience with this, I'd appreciate it. Thanks again for all of your work on this, I'm looking forward to seeing it in action. - Tim commit eb4b88fe5267c9f07d2ce3628f06d579cff17a76 Author: Doug Jacobsen <dmjacobsen@lbl.gov> AuthorDate: Mon Apr 29 16:05:54 2019 -0600 Move definition of SPANK_OPTION_ENV_PREFIX to plugstack.h. Bug 3745.
Thanks, Tim! I appreciate all the work to wade through my code and various designs here. I’m excited to get the plugins integrated on cori! Do you have any recommendations on how to make the plugins available (e.g. a patch) if others in the community want to help? I’m not sure what has been successful before for slurm. Thinking of, for example, the patches mvapich2 patches.
I'd suggest splitting each of the plugins off to a separate commit, and sequencing them on a clean branch on your or whomever's GitHub repo. Alternatively, you could merge them into one single commit for ease of application (although that is not how I'd want them submitted to us when it comes time to review - we do want them broken up a bit to make it easier on our review process). With GitHub, if you add .patch to the end of a commit it'll give you the raw patch[1] so you can point people to them relatively easily. The only downside is that if you keep rebasing your branch, the commit identifies will change. Hope that helps, - Tim [1] https://github.com/SchedMD/slurm/commit/590b0fef3d.patch
Created attachment 11939 [details] cli_filter plugins (19.05 apply-able) Here is a patch file with the plugins. We presently use the lua plugin, but intend to enable the user defaults plugin shortly. I'd like to propose that these get merged to master. Thanks, Doug
OK, good news. I'm starting to find a few bugs that will need to be fixed. How should I present these to you? This is both in the already-merged code as well as some changes I'm starting to make to the plugins.
Created attachment 12739 [details] cli_filter plugins for master branch assuming lua_remix branch is merged Please find attached the latest cli_filter patches for the master branch which assume that the lua_remix branch from bug 7963 are merged.
Created attachment 12740 [details] optional addon to execute cli_filter plugins from sinfo, scontrol, scancel, sprio, squeue (no argument parsing/manipulation) this is an optional patch we run with to engage the cli_filter plugins (tested with lua and user_defaults) for non salloc/sbatch/srun binaries. This is intended to be used for control logic (e.g., limit maximum RPC throughput via CLI-side enforcement) or other controls. this need not be applied for "normal" cli_filter functionality, but is something NERSC is using.