| Summary: | Requested nodes are busy when job --mem-per-cpu option > MaxMemPerCPU config | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Chris Read <cread> |
| Component: | Scheduling | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | da |
| Version: | 2.5.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DRW Trading | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
Slurm.conf with MaxMemPerCPU commented out
Disable setting implicit value of a job's cpus_per_task value |
||
|
Description
Chris Read
2013-06-27 08:18:57 MDT
Could you attach your slurm.conf configuration file? It can also be helpful to identify the specific version of Slurm in the trouble ticket, which I believe is v2.5.7 in your case. Created attachment 312 [details]
Slurm.conf with MaxMemPerCPU commented out
Here is the config with the MaxMemPerCPU commented out.
I get the same behaviour with 2.5.6 and 2.5.7.
Chris
Created attachment 315 [details]
Disable setting implicit value of a job's cpus_per_task value
This removes logic added three years ago that would automatically set a job's cpus_per_task value in order to reset a job's mem_per_cpu value and scale the cpus_per_task by the same value. Equivalent logic did not exist in the step allocation logic. Just return an error instead. This change will be made in Slurm version 2.6, but this batch is made for version 2.5. The original patch introducing the problem is in commit: cc00cc70b9c90816afc511e0261e449857176332
This is commit e3b7c2be4393d921679f3e0cddcb9ca7943fb1f6
See attached patch Thanks, tested in our dev environment, confirmed fixed. |