| Summary: | How does SLURM report that a job was killed because of a MEM contraint violation? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Brent G <brent.gawryluik> |
| Component: | Configuration | Assignee: | Benjamin Witham <benjamin.witham> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | benjamin.witham |
| Version: | 22.05.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Recursion Pharma | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Brent G
2023-06-26 15:45:42 MDT
Hello Brent,
Slurm should report that a job has failed with the reason tag in scontrol show job. Are you not seeing this behavior from your scontrol? Are you needing more information for a job failure, or just that it was out of memory?
The complete list of reason codes can be found here:
> https://slurm.schedmd.com/resource_limits.html
Thank you for the quick response. We will test this again tomorrow and look more closely at the `scontrol` output. Hello Brent, Just checking in to see if your scontrol is working properly and displaying your OOM kill reason. If so, I'll go ahead and close this ticket. Closing ticket |