| Summary: | Add example job_submit/colorize plugin | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | S Senator <sts> |
| Component: | slurmctld | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | ||
| Version: | 17.02.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | LANL | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
job_submit_colorize plugin
updated version with a work-around :-( job_submit_colorize.c this is our pre-production version |
||
I'll look at it in more detail tomorrow, but I think you'd want to just grab the info from the prepopulated node structures directly. Calling the public API within the plugin is likely leading to deadlock as you've observed. The only field needed to be modified is job_desc->admin_comment. The fields that I need to read are the node->features_act and node->features for the node named in job_desc->alloc_node. Created attachment 5068 [details]
updated version with a work-around :-(
Still looking to determine which data structure to traverse to do the equivalent of slurm_load_node_single(), just to read (only) the features specified for a given node.
I'm still playing with this, sorry for replying before you went down that rabbit hole of parsing the config separately.. Keep in mind that the C plugins can access all of the internals directly, there's definitely no need to reparse the config and stash a separate lookup table in memory. src/common/node_conf.c has most of what I believe you're looking for. Specifically, extern struct node_record *find_node_record(char *name); Will give you back the struct (assuming that node exists, otherwise NULL). We also have a very comprehensive List implementation borrowed from LLNL in src/common/list.c ; I see you've built your own there. I think you could also use the mcs_label in the job_record instead of the json format of admin_comment. I think the mcs_label plugin itself could provide the security model you're looking for without (ab)using the constraint/features subsystem, although I'm not as versed in setting that up and will need to look into it further. None of the other features are structured as "key=label" within Slurm at the moment, and I can't guarantee that we won't come up with some other use of the equals sign. (To Slurm, "color=blue" is just handled as the full text string.) I'd suggest using just "blue" as a feature on the nodes to simplify things. Created attachment 5071 [details]
job_submit_colorize.c
This is my proof of concept on reusing the MCS plugin to do the heavy lifting.
I changed my mind; Features=color=red seems okay for the time being, and should be relatively simple to change if needed in the future.
Note that you need to set:
MCSParameters=enforced,select
MCSPlugin=mcs/none
(Which will also generate one nuisance warning of
scontrol: WARNING: MCSParameters=enforced,select can't be used withMCSPlugin=mcs/none
which can be trivially disabled.)
Note that this does _not_ give you a way to 'color' the compute nodes; it's assuming that the nodes are allowed to switch back and forth for each separate job.
Thank you very much. That will help me to proceed cleanly. This is really just a short-term stopgap measure meant to address a regression of our environment. This way we can use the existing "node colorization" mechanisms that we currently have in the SlurmCtldProlog & SurmCtldEpilog. With the MCS mechanism, could you just configure the setting of the MCS policy based on the submission host? It seems I would still need to have an analogous job submission plugin to collect that characteristic. Feel free to refer me to the MCS documentation, but I did explore that at first, as per your suggestions. The reason we went down the path of using a NodeName/Feature hook is because our current mechanism is strongly tied to which submission host originates the job. The authority to colorize the node is tied to the access control to the submission host, in our environment. This is a feature of our environment which may not map to the general policy-focused MCS solution. I plan to explore the MCS mechanism on our test beds for possible (probable?) use in our environment. A motivating factor in our movement to Slurm was to leverage existing best practices. This fits. (In reply to S Senator from comment #6) > Thank you very much. That will help me to proceed cleanly. This is really > just a short-term stopgap measure meant to address a regression of our > environment. This way we can use the existing "node colorization" mechanisms > that we currently have in the SlurmCtldProlog & SurmCtldEpilog. > > With the MCS mechanism, could you just configure the setting of the MCS > policy based on the submission host? It seems I would still need to have an > analogous job submission plugin to collect that characteristic. Feel free to > refer me to the MCS documentation, but I did explore that at first, as per > your suggestions. You'd still need something quite similar, and to add an MCS plugin to handle that. I'm leveraging one piece of that (although it does trigger some warnings that can be readily disabled with a follow up patch, I left that as a further exercise). > The reason we went down the path of using a NodeName/Feature hook is because > our current mechanism is strongly tied to which submission host originates > the job. The authority to colorize the node is tied to the access control to > the submission host, in our environment. This is a feature of our > environment which may not map to the general policy-focused MCS solution. Yes - this isn't a perfect match for how MCS works right now, but was close enough that I felt that reusing part of it would be okay. I can't guarantee this would work for future releases; some restructuring of the MCS plugin would cause problems here. One nice side effect of this (that doesn't apply in your environment) is that it'd allow for multiple jobs of a given color to run simultaneously on a single node. > I plan to explore the MCS mechanism on our test beds for possible > (probable?) use in our environment. A motivating factor in our movement to > Slurm was to leverage existing best practices. This fits. It's not 100%; you'd need to disable that one pesky warning, but I think the rest is relatively sound, and I may add this as an additional example plugin in a future release. (Albeit cleaned up substantially, and with some documentation highlighting what it's meant to accomplish, and how it's interacting with MCS.) Is there anything further I can answer? I think that we can consider this a deferred or if you need to mark it as such, closed, issue. I'll probably attach the interim code to this ticket whether or not it is closed by then, just so that you & your team will have an understanding of what we are running. When we start the explicit use of MCS in production we'll recoordinate with you & your team. Thank you, -Steve Senator ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Wednesday, August 16, 2017 11:19 PM To: Senator, Steven Terry Subject: [Bug 4060] clarification requested: job_submit_plugin calls slurm_load_node_single() never returns - not reentrant? Comment # 7<https://bugs.schedmd.com/show_bug.cgi?id=4060#c7> on bug 4060<https://bugs.schedmd.com/show_bug.cgi?id=4060> from Tim Wickberg<mailto:tim@schedmd.com> (In reply to S Senator from comment #6<show_bug.cgi?id=4060#c6>) > Thank you very much. That will help me to proceed cleanly. This is really > just a short-term stopgap measure meant to address a regression of our > environment. This way we can use the existing "node colorization" mechanisms > that we currently have in the SlurmCtldProlog & SurmCtldEpilog. > > With the MCS mechanism, could you just configure the setting of the MCS > policy based on the submission host? It seems I would still need to have an > analogous job submission plugin to collect that characteristic. Feel free to > refer me to the MCS documentation, but I did explore that at first, as per > your suggestions. You'd still need something quite similar, and to add an MCS plugin to handle that. I'm leveraging one piece of that (although it does trigger some warnings that can be readily disabled with a follow up patch, I left that as a further exercise). > The reason we went down the path of using a NodeName/Feature hook is because > our current mechanism is strongly tied to which submission host originates > the job. The authority to colorize the node is tied to the access control to > the submission host, in our environment. This is a feature of our > environment which may not map to the general policy-focused MCS solution. Yes - this isn't a perfect match for how MCS works right now, but was close enough that I felt that reusing part of it would be okay. I can't guarantee this would work for future releases; some restructuring of the MCS plugin would cause problems here. One nice side effect of this (that doesn't apply in your environment) is that it'd allow for multiple jobs of a given color to run simultaneously on a single node. > I plan to explore the MCS mechanism on our test beds for possible > (probable?) use in our environment. A motivating factor in our movement to > Slurm was to leverage existing best practices. This fits. It's not 100%; you'd need to disable that one pesky warning, but I think the rest is relatively sound, and I may add this as an additional example plugin in a future release. (Albeit cleaned up substantially, and with some documentation highlighting what it's meant to accomplish, and how it's interacting with MCS.) Is there anything further I can answer? ________________________________ You are receiving this mail because: * You reported the bug. I would be interested, perhaps as a side conversation, to understand where you see the need to "clean this [plugin] up substantially." (In reply to S Senator from comment #9) > I would be interested, perhaps as a side conversation, to understand where > you see the need to "clean this [plugin] up substantially." Documentation mostly - a block at the top describing the rationale for it, and some comments throughout. And possibly a few warnings about how it's interacting with MCS in an odd fashion. My current to do list (in its git repository) reads as follows:
1- use slurm's data structures for node_conf
Note: required for multiple front-end nodes
[schedmd/Tim Wickberg feedback]
2- use slurm's List functions for needfree, nodes, colors
[schedmd/Tim Wickberg feedback]
3- use simple feature definition (ex. "yellow") rather than key-value ("color=yellow")
[schedmd/Tim Wickberg feedback]
4- pkgconfig collects version & slurm version similar to Makefile, spec file
[lanl/mej review]
5- split *.c into separate files per logical layering & dependencies
[maintainability/software engineering]
6- memory alloc/free/leak review - don't free at (per-job) fini() so faster for next job?
[software engineering]
7- rpm %verifyscript checks that:
a. colorize.conf contains ValidColors || Warning
b. (slurm.conf contains 'NodeName=... Feature=color=<color>' || Failure) and
<color> is in colorize.conf's ValidColors || Warning
8- update Node's ActiveFeatures with color
[maintainability]
9- convert to an MCS plugin, provide to slurm-contrib
[maintainability & community involvement]
Thank you,
-Steve Senator
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Thursday, August 17, 2017 10:55 AM
To: Senator, Steven Terry
Subject: [Bug 4060] clarification requested: job_submit_plugin calls slurm_load_node_single() never returns - not reentrant?
Comment # 10<https://bugs.schedmd.com/show_bug.cgi?id=4060#c10> on bug 4060<https://bugs.schedmd.com/show_bug.cgi?id=4060> from Tim Wickberg<mailto:tim@schedmd.com>
(In reply to S Senator from comment #9<show_bug.cgi?id=4060#c9>)
> I would be interested, perhaps as a side conversation, to understand where
> you see the need to "clean this [plugin] up substantially."
Documentation mostly - a block at the top describing the rationale for it, and
some comments throughout. And possibly a few warnings about how it's
interacting with MCS in an odd fashion.
________________________________
You are receiving this mail because:
* You reported the bug.
Updating some metadata around this. I believe you have a workable approach, and I'm moving this down to an enhancement request to look at merging my example code pack in ahead of the 17.11 release. cheers, - Tim Created attachment 5173 [details]
this is our pre-production version
This is our preproduction version. Due to deadlines this will probably roll-out initially as is, but the next iteration will use the slurm_parse_ routines. Coupled with that will be a removal of the name/val-tuple color=blue style feature so that this is implemented TheSlurmWay.
At some point in the future, this probably will be reimplemented as an MCS policy.
You may consider this ticket as ready to be closed. Marking resolved. |
Created attachment 5042 [details] job_submit_colorize plugin In a job_submit plugin, we are calling job_submit() -> slurm_load_node_single() in order to lookup the characteristics of the submission node which we have defined in slurm.conf as 'Features=color=yellow' and which we obtained from the p_jobdesc->alloc_node field. However, a query to slurm_load_node_single() never returns. In a job_submit plugin is there a safe way to query a node's characteristics? The show_flags tried are SHOW_ALL|SHOW_DETAIL, or SHOW_ALL, or SHOW_DETAIL. For all of these queries, the last messages in the slurmctld log is: debug3: Processing RPC: REQUEST_NODE_INFO_SINGLE from uid=0 debug: _slurm_recv_timeout at 0 of 4, timeout An strace -f of slurmctld shows a loop on a FUTEX: [pid 29837] futex(0x8ef744, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 311, {1502248633, 0}, ffffffff <unfinished ...> [pid 30892] <... poll resumed> ) = 0 (Timeout) [pid 30892] poll([{fd=7, events=POLLOUT}], 1, 5000) = 1 ([{fd=7, revents=POLLOUT}]) [pid 30892] fstat(7, {st_mode=S_IFREG|0600, st_size=6541675, ...}) = 0 [pid 30892] write(7, "[2017-08-08T21:17:12.522] debug:"..., 73) = 73 [pid 30892] fcntl(10, F_SETFL, O_RDWR) = 0 [pid 30892] close(10) followed by a SEGV after some number of minutes: [pid 30892] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x73610bbc} --- [pid 30893] +++ killed by SIGSEGV (core dumped) +++ [pid 30892] +++ killed by SIGSEGV (core dumped) +++ [pid 29843] +++ killed by SIGSEGV (core dumped) +++ [pid 29841] +++ killed by SIGSEGV (core dumped) +++ [pid 29840] +++ killed by SIGSEGV (core dumped) +++ [pid 29838] +++ killed by SIGSEGV (core dumped) +++ [pid 29837] +++ killed by SIGSEGV (core dumped) +++ [pid 29839] +++ killed by SIGSEGV (core dumped) +++ +++ killed by SIGSEGV (core dumped) +++ We are attempting to modify the admin_comment field based upon the values returned, followed by a slurm_update_job(*job_desc). Is this allowed? Note that this submission node is listed as 'DownNodes=...' and this node is not in any partition. It is not a node with Hidden=yes set. At this point in the plugin (job_submit()->set_jobcolor()->get_nodeinfo(), no modifications of slurmctld data structures have been performed, only get() operations. The code is attached. This is marked as medium impact as this functionality is a gap between our previous scheduler's capabilities and slurm's. We are very open to feedback regarding the best means to implement this.