| Summary: | Tasks fails to start when numerical/alphabetical order of nodes do not match | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Bjørn-Helge Mevik <b.h.mevik> |
| Component: | slurmd | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 1 - System not usable | ||
| Priority: | --- | CC: | da |
| Version: | 2.4.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | Fix for hostname prefixes of varying length | ||
|
Description
Bjørn-Helge Mevik
2012-11-28 22:54:45 MST
I spent the night debugging on our test cluster, and figured it out. The problem is in the function hostrange_hn_within in src/common/hostlist.c. The logic that is added to allow things like nid0000[2-7] assumes that the prefix in all hostranges are equally long. In our case, with node names like c1-1, c10-2 and c2-5, they are not. When hostrange_hn_within is used to check a host c2-5 against a range c10-[1-5], the logic modifies the host prefix from c2- to c2-5. So later, when checking whether c2-5 is in the range c2-[5-6], say, the prefix comparison fails, trying to compare c2- with c2-5. I'm attaching a small workaround fix for this. It simply skips the added logic unless the last character of the noderange prefix is a digit. So it will still handle cases like nid0000[2-7], but will not mess with ranges like c2-[5-6]. It would perhaps be more general to modify the logic to not change the function arguments, but that would mean the logic must be performed for each range. Cheers, Bjørn-Helge Created attachment 163 [details]
Fix for hostname prefixes of varying length
Awesome! Good to see it was so easy. This will be in 2.4.5/2.5.0 probably both tagged next week. |