Ticket 2991 - Add triggers for node DRAINING and RESUME states
Summary: Add triggers for node DRAINING and RESUME states
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 20.02.x
Hardware: Linux Linux
: C - Contributions
Assignee: Carlos Tripiana Montes
QA Contact: Tim McMullan
URL:
Depends on:
Blocks:
 
Reported: 2016-08-11 10:51 MDT by Moe Jette
Modified: 2022-09-12 10:05 MDT (History)
8 users (show)

See Also:
Site: Universitat Dresden (Germany)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Patch for bug 2991 (23.50 KB, patch)
2020-02-18 02:15 MST, Carlos Tripiana Montes
Details | Diff
Patch for draining/resume triggers, ported to 21.08.8 (20.79 KB, patch)
2022-08-10 08:32 MDT, Kilian Cavalotti
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Moe Jette 2016-08-11 10:51:12 MDT
Dear Slurm developers,

is there a way to have an strigger script run on RESUME or DRAINING
state changes of a compute node?

Best regards,
Ulf Markwardt <ulf.markwardt@tu-dresden.de>
Comment 1 Moe Jette 2016-08-11 10:51:52 MDT
These event triggers are not available today, but should be easy to add.
Comment 2 Steve Ford 2019-12-20 11:18:53 MST
Are there plans to add these even triggers? They would be very useful.
Comment 3 Tim Wickberg 2019-12-20 11:26:29 MST
(In reply to Steve Ford from comment #2)
> Are there plans to add these event triggers? They would be very useful.

Not currently. The "5 - Enhancement" status in our bugzilla is used to capture outstanding enhancement requests, but unless we've specifically identified a target release there is no commitment from SchedMD to develop these.

Absent development sponsorship, or additional customer demand, a lot of these unfortunately will go uncompleted. If you, or anyone else, is interested in what that process entails that please let me know.

- Tim
Comment 4 Carlos Tripiana Montes 2020-02-18 02:15:09 MST
Created attachment 13070 [details]
Patch for bug 2991

For more information refer to the commit message in the patch file.
Comment 5 Carlos Tripiana Montes 2020-02-18 02:24:27 MST
(In reply to Tim Wickberg from comment #3)
> (In reply to Steve Ford from comment #2)
> > Are there plans to add these event triggers? They would be very useful.
> 
> Not currently. The "5 - Enhancement" status in our bugzilla is used to
> capture outstanding enhancement requests, but unless we've specifically
> identified a target release there is no commitment from SchedMD to develop
> these.
> 
> Absent development sponsorship, or additional customer demand, a lot of
> these unfortunately will go uncompleted. If you, or anyone else, is
> interested in what that process entails that please let me know.
> 
> - Tim

Dear Tim, Moe, and Steve,

Here at BSC wanted to perform some actions whenever a node is out of and back to production. But we wanted to do so just at the moment of requesting so.

Draining is desirable in front of drained, because if a node is not drained instantaneously then the trigger is raised far away from the draining start time.

Resume is needed because a node could be on idle or allocated state without being back to prod, and we wanted to trigger only one time when a node goes back.

Tell me what you think of this patch proposal. I hope to have this 2 flags available in next releases.

Kind regards,

- Carlos.
Comment 6 hltcoe-help 2020-02-21 10:28:00 MST
We would also really appreciate this feature.  Any chance this can get merged soon?
Comment 7 Tim Wickberg 2020-02-21 11:26:45 MST
Sorry for not responding sooner, we've been busy closing out the 20.02 release.

I'll take a look at the proposed patch in a month or so for consideration for 20.11. Our freeze for major changes in 20.02 went in place in January, so I cannot consider it for 20.02.

- Tim
Comment 8 Carlos Tripiana Montes 2020-02-24 00:39:59 MST
(In reply to Tim Wickberg from comment #7)
> Sorry for not responding sooner, we've been busy closing out the 20.02
> release.
> 
> I'll take a look at the proposed patch in a month or so for consideration
> for 20.11. Our freeze for major changes in 20.02 went in place in January,
> so I cannot consider it for 20.02.
> 
> - Tim

Dear Tim,

thanks for the information, and no worries about the 20.02 freeze. It's completely OK as long as the patch can be taken into account soon or later.

Thank you very much, again, for your time.

Kind regards,
- Carlos.
Comment 9 Carlos Tripiana Montes 2020-09-01 23:25:34 MDT
Dear Tim,

Do you have any advance on this?

Thank you!
Comment 10 Kilian Cavalotti 2021-04-09 17:26:09 MDT
Hi there,

I was about to submit a separate request for this, but since it's already been requested here, I'd like to voice our interest for this feature as well. 

Thanks!
--
Kilian
Comment 11 Kilian Cavalotti 2022-08-09 09:15:59 MDT
Hi SchedMD,

Would it be possible to get Carlos' patch merged in 22.05?

Thanks!
--
Kilian
Comment 12 Kilian Cavalotti 2022-08-10 08:32:52 MDT
Created attachment 26252 [details]
Patch for draining/resume triggers, ported to 21.08.8

Here's a version of the patch ported to 21.08.8, that adds support for node draining and resume triggers.

We have it deployed on our main system, and it seems to work as expected.

Cheers,
--
Kilian
Comment 13 Gordon Dexter 2022-08-11 10:24:07 MDT
Hope this can be merged soon.
Comment 24 Carlos Tripiana Montes 2022-09-12 09:14:20 MDT
Hi,

This is implemented in 1b795f0325...83138422e9 series of commits in master branch.

I'm going to mark this as resolved/fixed.

Cheers,
Carlos.
Comment 25 Kilian Cavalotti 2022-09-12 09:26:06 MDT
Hi Carlos,

(In reply to Carlos Tripiana Montes from comment #24)
> This is implemented in 1b795f0325...83138422e9 series of commits in master
> branch.
> 
> I'm going to mark this as resolved/fixed.

This is great, thank you!

Is there a chance this patchset could be back-ported to 22.05 as well?
Maybe not merged in the tree (I understand it would be a new feature in an existing branch and may be an issue) but at least posted as a patch here?

Thanks!

Cheers,
--
Kilian
Comment 26 Carlos Tripiana Montes 2022-09-12 10:05:23 MDT
I'm afraid we don't officially deliver backported patches when they are targeted to modern versions, but I'm fairly sure this would be pretty easy to do on your own.

But, as always, we can't officially provide support for any derived consequences of such changes.

Hope you'll understand.

Cheers,
Carlos.