| Summary: | ActiveFeatures job submission | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | AB <Alexander.Block> |
| Component: | Scheduling | Assignee: | Brian Christiansen <brian> |
| Status: | RESOLVED WONTFIX | QA Contact: | Brian Christiansen <brian> |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | brian, jbooth, tim |
| Version: | 20.11.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | LRZ | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
AB
2022-02-02 01:52:19 MST
Hi, since I did not hear anything since I opened the ticket one week ago I just want to ask what the status of this ticket is? Thanks, Alexander That is my fault Alexander, My apologies. We've been internally discussing your case and I'm now testing a couple of things before sending you an answer. Again, sorry for the lack of response, Carlos. Alexander, The stated in Description is a natural way to go for implementing hotswap for filesystems. But it's not the good way to go. We can accept to make the job wait and not be refused if a requested feature is not available or active. BUT, if we allow to send the job, the NodeFeatures plugin will be actively seeking for the nodes to re-activate the disabled feature. This is accomplished calling the RebootProgram, and letting the script to handle the reconfiguration of the node to make the feature active after reboot. This comes from the nature of this plugin, which is very tight to KNL and the handling of their different compute layers. So if we, now, where allowing it, you'll end with your nodes constantly rebooting. The good new is that there's another approach that will work. Using Licenses: you would want to do is define a pool of local licenses, let's say 1,000,000, in your slurm.conf. Then either have the users request one of the licenses or have a submit_script apply the license to every job that is submitted. If you want to keep the users away from that FS, set this [1] in slurm.conf. Then, deactivate access setting a reservation as explained in [2]: scontrol create reservation starttime=[...] \ duration=[...] user=root flags=license_only \ licenses=[NAME]:[TOTAL_NUMBER] You can also go for the remote licenses mode explained in [3], but I think local ones are easy for this purpose. Remote licenses can also be set to zero with sacctmgr as explained in [3]. Please, tell us if this is feasible for you. Cheers, Carlos. [1] https://slurm.schedmd.com/slurm.conf.html#OPT_allow_zero_lic [2] https://slurm.schedmd.com/reservations.html#creation [3] https://slurm.schedmd.com/licenses.html Hi Carlos, thanks for your answer. I understand what you are saying. However then the configuration of features for nodes makes not really sense to me. In the documentation of slurm.conf is stated "Features are intended to be used to filter nodes eligible to run jobs via the --constraint argument." I can also think of the case that only a subset of the nodes will not have a certain feature for a limited time then submission of jobs is also suppressed (e.g. jobs requesting all nodes). In our recent case we could of course use the license workaround. But this only works proper if ALL nodes are available or not. There is no node specific license availability to my knowledge?! Best regards, Alexander Hi Alexander, > In the documentation of slurm.conf is stated "Features are intended to be used to filter nodes eligible to run jobs via the --constraint argument." If you are only playing with static features (shown as active==avail) and never modify on the fly what's in active/avail this statement is still valid. Further information is provided in [1] and [2]. > I can also think of the case that only a subset of the nodes will not have a certain feature for a limited time then submission of jobs is also suppressed (e.g. jobs requesting all nodes). If you set licenses=0 nobody that wants to use that license will be allowed to run, but it will be allowed to pend in queue until licenses are available. You can code a job_submit.lua so that if the user sets constraint A the script removes it from flags and sets X licenses of type A (same as num nodes). So only the jobs that wants this file system will be prevented to run. > In our recent case we could of course use the license workaround. But this only works proper if ALL nodes are available or not. I think the way I've just proposed above should workaround this, if I'm understanding well your concerns. > There is no node specific license availability to my knowledge?! Ahh, I'm afraid not. Much in the same way there's no option today to say to the NodeFeatures logic to "please, do not reboot the node, this feature is plug-n-play, it doesn't require a reboot). I don't think the last bit is a bad idea. I think this is something that might be a future work open for a sponsorship. That could be further discussed with us if there's a real interest on it. Cheers, Carlos. [1] https://slurm.schedmd.com/slurm.conf.html#OPT_NodeFeaturesPlugins [2] https://slurm.schedmd.com/intel_knl.html Hi Carlos, what is not clear to me is the different behavior in Slurm. Lets say a user submits on a 4 node cluster with certain features on all nodes a job requesting all 4 nodes with all features. The job is submitted, the cluster is full so the job goes to pending. Then it happens that a node goes down. The user can still submit the same job, the job is pending in the queue because the requirements are not fulfilled. But this jobs would start as soon as the node comes back. Now the node comes back only missing something (lets say a file mount is still pending). The admins are still working on the node, disabling the feature in the meantime for this node but bring it back to operation. Now the user can't even submit his job again and he has no chance to get informed when the feature/configuration is available again. I think it is ok that configuration parameters like CPU and Memory (i.e. hardware) are checked strictly at submission but for the free configurable features this is a bit too hard in my eyes. In any case what I may ask you before you can close the ticket is if you maybe can point me to the part in the code where the rejection is implemented. I may have a look and check if I can patch it for our needs myself. Thanks, Alexander Hi Alexander, > The user can still submit the same job, the job is pending in the > queue because the requirements are not fulfilled. You've spotted the difference. The job waits until there's resources to run, but the configured resources meets the constrained features. The job just waits until it can run. On the other hand, if the submission knows it can't never meet the constraints, it refuses to accept the job. Yes, it's counterintuitive. It's very tight to the NodeFeaturesPlugins nature, which is to use Active/Available features to know if it needs to reboot a node on the fly to meet a job's constraints. So if you aren't using NodeFeaturesPlugins parameter, he thinks those features are static, and if not available it will never be available. As I said, we're not refusing your proposal of changing this, and allow the job to wait. But, up to now, it's not implement and I have no roadmap right now for it. In any case, you will still have the call to RebootProgram, something you'll probably don't want. To avoid the reboot, we need the feature extension of adding a parameter to instruct Slurm to reboot or not the job's nodes. This is even more complex, thus the sponsorship thing. Nevertheless, this might be another option: If you go and set NodeFeaturesPlugins=node_features/helpers [1][2]. This still reboots the nodes, but it will allow you to stay in queues rather than reject the job. As I said, this is probably not the way you want to go unless you don't care of RebootProgram and you can fake it to /bin/true or similar. > In any case what I may ask you before you can close the ticket is if you > maybe can point me to the part in the code where the rejection is > implemented. I may have a look and check if I can patch it for our needs > myself. That's compromising to me. I mean, to answer your question I need to do the same previous study I'd need to do if we were going to patch this now. And, at the end of the day, you still be forced to reboot the node. Additionally, the feature extension of adding a parameter to instruct Slurm reboot the nodes or not is even more complex. Not a simple patch. All in all, this effort would be made for an unsupported issue: We don't give support to problems derived as a consequence of custom patching, as you might be aware of. So devoting time to this extent is not a good idea... better if you workaround this with NodeFeaturesPlugins=node_features/helpers instead. But, I still think going for the Licences approach would be less error prone by now. I hope you'll understand the situation. Cheers, Carlos. [1] https://slurm.schedmd.com/slurm.conf.html#OPT_NodeFeaturesPlugins [2] https://slurm.schedmd.com/helpers.conf.html Hi Carlos, maybe there was a misunderstanding. My idea is the following - I can specify any feature on a node I like. And I can take it away any time I like (without rebooting but reconfigure/restart of course). Jobs that have been submitted while the feature was configured stay in the queue, running jobs keep running (I tested this before) but pending jobs just will not start. BUT nothing will try to reboot nodes, right? New jobs requesting the feature will not be submitted - that's the point where I want to allow jobs requesting features that are not there. There is no reboot, just simply let them go to the queue. They will not run of course but all others in the queue still requesting the feature taken away will also not run and stay in the queue forever or till some administrator cancels them. Would this make sense to you? I understand that this will not be supported by SchedMD. And I have no idea if I can make it work. But if, then we would have a solution for our problem. Thanks again, Alexander I see... > Jobs that have been submitted while the feature was configured stay in the > queue, running jobs keep running (I tested this before) but pending jobs > just will not start. Indeed. > BUT nothing will try to reboot nodes, right? Because you don't have NodeFeaturesPlugins set. > New jobs > requesting the feature will not be submitted - that's the point where I want > to allow jobs requesting features that are not there. If the submission knows it can't never meet the constraints, it refuses to accept the job if you don't have NodeFeaturesPlugins set. Obviously, it doesn't cancel all the jobs already there, as you pointed out. But this interaction/behaviour is a side effect. The purpose of Available/Active features is, as I said, to instruct NodeFeaturesPlugins which config needs to be set and reboot the nodes. If you only use static features you shouldn't be playing with ActiveFeatures as it is not for what you are looking for to accomplish. > There is no reboot, Because you don't have NodeFeaturesPlugins set. > just simply let them go to the queue. As I said: we're not refusing your proposal of changing this, and allow the job to wait. But, up to now, it's not implement and I have no roadmap right now for it. AND: If we allow the logic to accept the job it will reboot your node in the event you have a NodeFeaturesPlugins in use, or you enable it lately. > They will not run of course but all > others in the queue still requesting the feature taken away will also not > run and stay in the queue forever or till some administrator cancels them. Or until you add back the feature that is Available but not Active. > Would this make sense to you? Yes it makes sense. I've realized of your concerns since the beginning but I'm trying to bring you alternatives since this is not a bug but intended behaviour and, even though I agree that we can modify it, I have no exact rule to tell you when it would be implemented, even if we are finally to do so (all feature changes needs to be peer reviewed and accepted). Anyway, give me a while to internally discuss again this. We still think the correct way to go is not only accepting the job to be sent regardless if there's any of NodeFeaturesPlugins set or not, but also a way to tell slurm to reboot or not the node depending on the feature, something like adding to node_features/helpers a new parameter to determine if we need to reboot or not the node. Cheers, Carlos. Hi Alexander, This update is to let you know I'm now in the middle of the patch review to add support for not refusing jobs if the features are "inactive". They will be allowed to be submitted, but pending with BadConstraints. The idea of "inactive" feature is one that is avail but not active. The idea of a job that request an inactive features if for when the constraints string of the job has a formula in which the whole evaluation is "requirements not meet", but if you activate the avail features in some nodes, then you can "meet the requirements". I hope the review process doesn't take to long, but the whole logic is a bit tricky and needs special care to avoid bugs. Have a good day, Carlos. Hi Carlos, thanks for the feedback. That's good news. I also had a look at the code in the meantime and decided not to dig deeper - as you said, things are a bit tricky and for me it was too much effort. But good to know that there will be a solution from your side. Thanks, Alexander Alexander, Still working in this improvement. It's not stalled. Regards, Carlos. Hi Carlos, thanks for the update. Regards, Alexander Hi Alexander, We're still working in this issue. I guess it's not going to be fast to get this done and pushed. If you see silence intervals like the last one please be comprehensive as we are facing some complexities and we need to address quite a lot of other stuff as well. This doesn't mean we have left aside your issue. Thanks for your understanding. Regards, Carlos. Just some minor bookkeeping here. We are converting this to an enhancement and passing along an update for this issue. We are still looking into this since it overlaps with some other work planned for our next release. As such, this will not be part of 22.05. Hi, thanks for your updates. If you wish you may also close this ticket. I will see the enhancement once you have implemented this into a version. Best, Alexander Hi Alexander,
> If you wish you may also close this ticket.
We're not going to close it, so we can better track that enhancement.
Our work here overlaps with some other development, but it's still to be clear whether that development is going to fully supersede or not what we have discussed here.
As result, it's best to keep it open as enhancement. I'll let you know of any related news but in the meantime I'm stalling this bug.
Regards,
Carlos.
I wanted to pass along an update regarding this issue. Although we thought this might overlap with other development, this issue diverged in a different direction. Carlos's initial work on this looked promising, but after reviewing this and the work involved, it is not something we plan to address anytime in the near future. This resolution is unfortunate given our initial comments on patching this, however, we have to consider where we spend our development time. I do apologize for the inconvenience this might cause. |