| Summary: | Jobs going to nodes that are not members of the selected partition | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Adam <adam.munro> |
| Component: | slurmctld | Assignee: | Director of Support <support> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | felip.moll |
| Version: | 20.02.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=9225 | ||
| Site: | Yale | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
bug8847_2002_v12.patch
bug8847_2002_v13.patch |
||
Since we upgraded to 20.02.3 we have seen jobs submitted to one partition end up running on nodes that are not a member of the partition the jobs were submitted to. For example, all of these: User JobID Partition State Submit Start NodeList --------- ------------ ---------- ---------- ------------------- ------------------- --------------- ga254 65507956 day COMPLETED 2020-08-29T11:24:43 2020-08-29T11:24:44 c31n06 ga254 65507958 day COMPLETED 2020-08-29T11:24:45 2020-08-29T11:24:46 c31n06 ch2229 62930363 pi_econ_io COMPLETED 2020-08-11T08:22:02 2020-08-11T08:22:48 p08r02n40 ch2229 62967916 pi_econ_io COMPLETED 2020-08-11T11:56:25 2020-08-11T11:57:04 p08r02n40 ch2229 63006219 pi_econ_io COMPLETED 2020-08-11T17:24:47 2020-08-11T17:24:48 p08r02n44 ch2229 63292472 pi_econ_io COMPLETED 2020-08-13T17:38:50 2020-08-13T17:39:11 p08r02n36 lf468 62468450 pi_econ_lp FAILED 2020-08-06T18:43:30 2020-08-06T18:44:11 p08r02n40 fd338 64246551 pi_polima+ COMPLETED 2020-08-21T15:27:16 2020-08-25T14:55:09 p08r02n36 ..none of the above nodes are/were members of any of the listed partitions (eg: c31n06 is not a member of "day", etc). This does not happen very frequently, but it is a big problem because the owners of the nodes are unhappy with other user's jobs running on their nodes. Thank you, Adam