Summary: | Dynamically update SuspendExcNodes and SuspendExcParts during runtime | ||
---|---|---|---|
Product: | Slurm | Reporter: | Chrysovalantis Paschoulas <c.paschoulas> |
Component: | Other | Assignee: | Scott Hilton <scott> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 5 - Enhancement | ||
Priority: | --- | CC: | bas.vandervlies, brian, skyler |
Version: | 21.08.8 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Jülich | Slinky Site: | --- |
Alineos Sites: | --- | Atos/Eviden Sites: | --- |
Confidential Site: | --- | Coreweave sites: | --- |
Cray Sites: | --- | DS9 clusters: | --- |
Google sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | 23.02.0rc1 | |
Target Release: | 23.11 | DevPrio: | 1 - Paid |
Emory-Cloud Sites: | --- |
Description
Chrysovalantis Paschoulas
2022-10-16 09:23:19 MDT
This is really a nice idea dynamically update the exclude config for power saving mechanism. SuspendExcNodes has a peculiar kind of nodelist with the optional ":" separator. This is used to specify groups of nodes from which a certain number should stay online. See: https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendExcNodes Is it important to you to be able to add and remove from lists with this special ":" syntax? If so what is specifically needed for your workflow? -Scott (In reply to Scott Hilton from comment #6) > SuspendExcNodes has a peculiar kind of nodelist with the optional ":" > separator. This is used to specify groups of nodes from which a certain > number should stay online. > See: https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendExcNodes > > Is it important to you to be able to add and remove from lists with this > special ":" syntax? If so what is specifically needed for your workflow? > > -Scott Hi Scott! No, for our case I would say that this feature of ":" is not needed. As far as I can imagine we will need to exclude from suspension only specific nodes, e.g. because we will want to use them for various reasons (like reserving them for a course or doing some tests on them, running the testsuite etc..) or keeping them online for doing some maintenance, HW work, etc.. Cheers, Valantis Valantis, We have completed this feature and it should be part of release 23.02. See commits fc5ec8c83f - 77c1c7d7ae. -Scott (In reply to Scott Hilton from comment #13) > Valantis, > > We have completed this feature and it should be part of release 23.02. See > commits fc5ec8c83f - 77c1c7d7ae. > > > -Scott Hi Scott, that's great! I see that we will be able to dynamically update the excluded states too :) Thank you very much! -Valantis |