| Summary: | Seeing errors related to bug 3694 re: BUG numa_policy ... objects remaining in numa_policy ... on kmem_cache_close() | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Ryan Novosielski <novosirj> |
| Component: | slurmd | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Rutgers | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | amarel | CLE Version: | |
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Ryan Novosielski
2018-04-09 07:42:20 MDT
Hey Ryan - It's a kernel bug at heart. Nothing user-space does should ever be able to cause that type of crash, so there's nothing for us to chase down here. If you have a RHEL support contract I'd suggest getting them in the loop on this. It's possible that, due to some changes of how we managed the various cgroups (which don't appear to be directly implicated here, but have usually been the root cause of some other issues), that Slurm's behavior in 17.11 will avoid triggering this. But you'd have to test to narrow that down. - Tim Marking as resolved/infogiven. Please reopen if there's anything further I can address. - Tim |