| Summary: | request for information: submitting with incorrect partition/constraint pair | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Michael Hebenstreit <michael.hebenstreit> |
| Component: | User Commands | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 20.02.4 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Intel CRT | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Michael Hebenstreit
2020-10-27 06:27:37 MDT
>Sometimes users get exclusive access to certain nodes, those nodes are removed from the standard queues. Did you consider creation of an advanced reservation for those nodes for specific time instead of removal from the standard queue[1]? cheers, Marcin [1]https://slurm.schedmd.com/reservations.html Our procedures are set via queues. Doing reservations might be a longtime option we are considering, but for the moment we’d need an answer to that question. If the answer is “not possible” our users will have to live with that but will be unhappy From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, October 27, 2020 7:08 AM To: Hebenstreit, Michael <michael.hebenstreit@intel.com> Subject: [Bug 10067] request for information: submitting with incorrect partition/constraint pair Marcin Stolarek<mailto:cinek@schedmd.com> changed bug 10067<https://bugs.schedmd.com/show_bug.cgi?id=10067> What Removed Added CC cinek@schedmd.com<mailto:cinek@schedmd.com> Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=10067#c1> on bug 10067<https://bugs.schedmd.com/show_bug.cgi?id=10067> from Marcin Stolarek<mailto:cinek@schedmd.com> >Sometimes users get exclusive access to certain nodes, those nodes are removed from the standard queues. Did you consider creation of an advanced reservation for those nodes for specific time instead of removal from the standard queue[1]? cheers, Marcin [1]https://slurm.schedmd.com/reservations.html ________________________________ You are receiving this mail because: * You reported the bug. Michael, The "hack" that should work may be addition of dummy - not really existing DOWN state nodes to the partition with the configuration supported by real nodes while real nodes are removed. This will have a drawback of slurmctld trying to ping those nodes resulting in missleading error messages, but if you set the IP address of those to something not available everything should fail quickly without big impact on the controler. At a glance I don't see any serious issue that may come out of this approach, but since it's rather not standard let me know if this leads to unexpected behavior - maybe we'll be able to further tune it. cheers, Marcin Thanks, that’s what I thought too You can close the ticket From: bugs@schedmd.com <bugs@schedmd.com> Sent: Tuesday, October 27, 2020 8:24 AM To: Hebenstreit, Michael <michael.hebenstreit@intel.com> Subject: [Bug 10067] request for information: submitting with incorrect partition/constraint pair Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=10067#c3> on bug 10067<https://bugs.schedmd.com/show_bug.cgi?id=10067> from Marcin Stolarek<mailto:cinek@schedmd.com> Michael, The "hack" that should work may be addition of dummy - not really existing DOWN state nodes to the partition with the configuration supported by real nodes while real nodes are removed. This will have a drawback of slurmctld trying to ping those nodes resulting in missleading error messages, but if you set the IP address of those to something not available everything should fail quickly without big impact on the controler. At a glance I don't see any serious issue that may come out of this approach, but since it's rather not standard let me know if this leads to unexpected behavior - maybe we'll be able to further tune it. cheers, Marcin ________________________________ You are receiving this mail because: * You reported the bug. Marking as infogiven. Should you have any questions please don't hesitate to reopen. cheers, Marcin |