| Summary: | Improve feedback on Out Of Memory conditions | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Alejandro Sanchez <alex> | 
| Component: | slurmd | Assignee: | Alejandro Sanchez <alex> | 
| Status: | OPEN --- | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | felip.moll, kaizaad | 
| Version: | 18.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=6765 https://bugs.schedmd.com/show_bug.cgi?id=9737 https://bugs.schedmd.com/show_bug.cgi?id=10122 | ||
| Site: | SchedMD | Slinky Site: | --- | 
| Alineos Sites: | --- | Atos/Eviden Sites: | --- | 
| Confidential Site: | --- | Coreweave sites: | --- | 
| Cray Sites: | --- | DS9 clusters: | --- | 
| Google sites: | --- | HPCnow Sites: | --- | 
| HPE Sites: | --- | IBM Sites: | --- | 
| NOAA SIte: | --- | NoveTech Sites: | --- | 
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- | 
| Recursion Pharma Sites: | --- | SFW Sites: | --- | 
| SNIC sites: | --- | Tzag Elita Sites: | --- | 
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | 3 - High | 
| Emory-Cloud Sites: | --- | ||
| 
        
          Description
        
        
          Alejandro Sanchez
        
        
        
        
          2018-01-04 04:45:54 MST
        
       Part of the enhancement has been solved here: https://github.com/SchedMD/slurm/commit/943c4a130f39dbb1fb Perhaps modify the API so that we get rid of the SIG_OOM and instead we add a new member(s) to reflect oom-kill event and/or memory hitting the limit, perhaps displaying the second as SystemComment. Try to detect kernels with different oom counts available in the event file: https://patchwork.kernel.org/patch/9737381/ and use this instead of the manual eventfd() monitoring. *** Ticket 6765 has been marked as a duplicate of this ticket. *** |