Summary: | pam_slurm_adopt with ConstrainRAMSpace=no | ||
---|---|---|---|
Product: | Slurm | Reporter: | Juergen Salk <juergen.salk> |
Component: | Other | Assignee: | Tim McMullan <mcmullan> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | ||
Version: | 19.05.5 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Ulm University | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | Google sites: | --- |
HPCnow Sites: | --- | HPE Sites: | --- |
IBM Sites: | --- | NOAA SIte: | --- |
NoveTech Sites: | --- | Nvidia HWinf-CS Sites: | --- |
OCF Sites: | --- | Recursion Pharma Sites: | --- |
SFW Sites: | --- | SNIC sites: | --- |
Tzag Elita Sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | 20.02.6 20.11.0pre1 | Target Release: | --- |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Attachments: |
cgroup.conf
slurm.conf bug9355 patch |
Description
Juergen Salk
2020-07-07 14:51:11 MDT
Hi Jürgen, That doesn't sound like expected behavior, and when I went to replicate it in my environment it seemed to work just fine with ConstrainRAMSpace=no set. Would you be able to attach your slurm.conf and cgroup.conf files so I can better replicate what you have? What OS are you running on? Thanks! --Tim Created attachment 14947 [details]
cgroup.conf
Created attachment 14948 [details]
slurm.conf
Hi Tim, I have attached our slurm.conf and cgroup.conf files. This is running on CentOS 8.2. Best regards Jürgen (In reply to Juergen Salk from comment #4) > Hi Tim, > > I have attached our slurm.conf and cgroup.conf files. This is running on > CentOS 8.2. > > Best regards > Jürgen Thank you! I was able to reproduce this behavior with that information. I'll update you when I have a fix for you! Thanks again! --Tim Hi Tim, is there any news on that? Best regards Jürgen (In reply to Juergen Salk from comment #9) > Hi Tim, > > is there any news on that? > > Best regards > Jürgen Hi Jürgen! Yes, sorry about that! I've written and tested a patch that fixes this issue, it is currently just awaiting review! Thanks, --Tim Hi Tim, we are about to upgrade to Slurm version 20.02.05 during our next scheduled cluster maintenance. Is this issue fixed in version 20.02.05? I've looked through the announcements for versions 20.02.04 and 20.02.05 but could not not find any indications. Best regards Jürgen Hi Jürgen, The patch for this didn't quite make it into 20.02.5 unfortunately. I'm working on getting the patch in as soon as possible. If you need it, I can provide a patch to you that should be close to what ends up landing! Let me know if this would be helpful for you! Thanks! -Tim Vielen Dank für Ihre Nachricht. Bis zum 02.10.2020 bin ich leider im Büro nicht erreichbar. Ich werde mich aber schnellstmöglichst mit Ihnen in Verbindung setzen. In dringenden Fällen wenden Sie sich bitte an: kiz.hpc-admin@uni-ulm.de Mit freundlichen Grüßen, Jürgen Salk ----------------------------------------------------------------------------------------- Thank you for your e-mail. I am out of office until Oct 2nd 2020. I will have limited access to my e-mail during this period but will answer your message as soon as possible. If you have immediate questions or concerns, please contact kiz.hpc-admin@uni-ulm.de Best regards, Juergen Salk (In reply to Tim McMullan from comment #12) > The patch for this didn't quite make it into 20.02.5 unfortunately. I'm > working on getting the patch in as soon as possible. If you need it, I can > provide a patch to you that should be close to what ends up landing! Let me > know if this would be helpful for you! Hi Tim, yes, it would probably be useful for us to get the patch beforehand unless version 20.02.06 is going to be released the next couple of days and will then include your patch. Best regards Jürgen Created attachment 16179 [details] bug9355 patch Here is the patch! Let me know if you have any issues with it! Thanks! --Tim Hi Jürgen, I'm happy to report that this patch was merged ahead of 20.02.6, so it should be in the next release! Thank you for your patience on this one. I'm going to resolve this ticket for now, but please let me know if you have any other questions! Thanks! --Tim Thank you Tim. We are right in the middle of our scheduled cluster maintenance and have just updated from 19.05.5 to 20.02.5. However, we have backported your patch to 20.02.5 and is seems to work very well. Thanks again. Best regards Jürgen |