Summary: | slurmstepd hangs for no apparent reason | ||
---|---|---|---|
Product: | Slurm | Reporter: | Ryan Cox <ryan_cox> |
Component: | slurmstepd | Assignee: | Tim Wickberg <tim> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 17.11.5 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | BYU - Brigham Young University | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Ryan Cox
2018-05-01 17:09:30 MDT
Hey guys - This is a duplicate of what we're now tracking in bug 5103. There is a patch on there (marked obsolete though as the commits corresponding to it have landed on the 17.11 branch) that should prevent this. We expect to have 17.11.6 out soon to address this, it does seem like some recent RHEL glibc changes exposed this unsafe thread behavior and make it a lot more likely to deadlock. - Tim *** This ticket has been marked as a duplicate of ticket 5103 *** We're actually still on RHEL6 for a few more months. We might apply that patch today. Thanks. |