| Summary: | Slurm does only honor DefMemPerCPU of first partition, when submitting to more than one partition | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Tim Ehlers <tehlers> |
| Component: | Limits | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | alex, kilian |
| Version: | 18.08.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=8011 | ||
| Site: | GWDG | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 19.05.0rc2 20.02.0pre1 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
slurm.conf
partitions.conf nodes.conf |
||
|
Description
Tim Ehlers
2019-05-02 06:37:51 MDT
Hi Tim, I can reproduce this also in 19.05. There's historically been a lot of discussion[1] around this problem (I guess that's why Kilian added himself to CC since he was involved in some of them). Can you attach your current slurm.conf? I'm interested in a few options like EnforcePartLimits. [1] Some related commits and bugs: 17.11.7 https://github.com/SchedMD/slurm/commit/bf4cb0b1b0 17.11.8 https://github.com/SchedMD/slurm/commit/f07f53fc13 17.11.8 https://github.com/SchedMD/slurm/commit/d52d8f4f0c Created attachment 10153 [details]
slurm.conf
Created attachment 10154 [details]
partitions.conf
Created attachment 10155 [details]
nodes.conf
sure, appended (3 files).
To explain: We are using "dummy" partitions like "medium", the users should submit to and check for these partitions in "job_submit.lua". If submitted to "medium", the submit string is changed to "-p medium-faz,medium-fmz", as you advised us in the course in Goettingen.
If we now can't honor DefMemPerCPU per partition, the whole mechanism would be kind of useless... :(
Best
Hi. Just as an update I've triggered the review process for a patch for 19.05. (In reply to Alejandro Sanchez from comment #10) > Hi. Just as an update I've triggered the review process for a patch for > 19.05. Thanks! Tim, this has been fixed in the following commit available since Slurm 19.05.0rc2: https://github.com/SchedMD/slurm/commit/8a1e5a5250b3ce469c |