Ticket 10349 - OOM errors for all jobs
Summary: OOM errors for all jobs
Status: RESOLVED DUPLICATE of ticket 10336
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 20.11.0
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-12-03 12:31 MST by Ahmed Essam ElMazaty
Modified: 2020-12-04 04:59 MST (History)
0 users

See Also:
Site: KAUST
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ahmed Essam ElMazaty 2020-12-03 12:31:55 MST
Hello team,
We're affected by the same issue in bug https://bugs.schedmd.com/show_bug.cgi?id=10255 and it's affecting all jobs after upgrading to 20.11.0
Is there a patch to fix this issue till 20.11.1 is released? Also, Any estimation regarding the release date of 20.11.1
Best regards,
Ahmed
Comment 1 Jason Booth 2020-12-03 14:46:57 MST
Ahmed - I will have Felip reply to you in more details on the status of this issue. In regard to your other questions, we are tentatively targeting the 20.11.1 release for next week.
Comment 2 Felip Moll 2020-12-04 04:59:12 MST
Hi Ahmed,

This question is exactly a duplicate of bug 10336. If you don't mind I am closing this out as a DUP. You can apply the patch from this commit:

https://github.com/SchedMD/slurm/commit/272c636d507e1dc59d987da478d42f6713d88ae1

or just wait until 20.11.1 is released.

Post further questions into 10336.

Thanks!

*** This ticket has been marked as a duplicate of ticket 10336 ***