| Summary: | Specifying --cores-per-socket prevents using more cores than are on a single socket | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Christopher Samuel <chris> |
| Component: | User Commands | Assignee: | Dominik Bartkiewicz <bart> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | bart |
| Version: | 17.11.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Swinburne | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 17.11.6 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | slurm.conf | ||
|
Description
Christopher Samuel
2018-03-27 19:00:37 MDT
Hi Could you send me current slurm.conf and your sinteractive script. Dominik Created attachment 6507 [details] slurm.conf Hi Dominik, On Friday, 30 March 2018 3:05:43 AM AEDT you wrote: > Could you send me current slurm.conf and your sinteractive script. sinteractive is just: #!/bin/bash exec srun $* --pty -u ${SHELL} -i -l I've attached our slurm.conf All the best, Chris Hi Thanks, I can reproduce this now. This behavior is effect of interaction '--cores-per-socket' with MaxCPUsPerNode. Dominik Hi Dominik, On Friday, 30 March 2018 9:38:09 PM AEDT bugs@schedmd.com wrote: > Thanks, I can reproduce this now. > This behavior is effect of interaction '--cores-per-socket' with > MaxCPUsPerNode. Ah yes, I hadn't thought to test it on the other partitions we have and yes, that works. So definitely a bug then? cheers! Chris Hi I think this is a bug. I don’t know yet how to fix this with current MaxCPUsPerNode implementation. From my observation this is not working on 16.05 too, are you observing the same thing? Dominik Hiya, On Saturday, 31 March 2018 12:51:00 AM AEDT you wrote: > I think this is a bug. Yeah, looks like it to me too. > I don’t know yet how to fix this with current MaxCPUsPerNode implementation. > From my observation this is not working on 16.05 too, are you observing the > same thing? This system was brought up with 17.11.0 on it, the other systems I mentioned are just ones I have access to as a user and they don't have MaxCPUsPerNode set on any partitions. All the best, Chris Hi This patch solved this issue https://github.com/SchedMD/slurm/commit/6de8c831ae9c388. It will be in 17.11.6 Let me know if this works on your hardware exactly as you expected. Dominik On 10/05/18 00:51, bugs@schedmd.com wrote: > Hi Hi Dominik, > This patch solved this issue > https://github.com/SchedMD/slurm/commit/6de8c831ae9c388. It will be > in 17.11.6 Let me know if this works on your hardware exactly as you > expected. Thanks so much for this, this is my last day in the office before I head out on leave for 3 weeks so I'll try and get it done today (have some meetings too). If I can't I'll let you know in June when I'm back in Australia. All the best! Chris On 10/05/18 10:08, Christopher Samuel wrote:
> Thanks so much for this, this is my last day in the office before I
> head out on leave for 3 weeks so I'll try and get it done today (have
> some meetings too).
Tested and looking good, thank you!
[csamuel@farnarkle1 pt2pt]$ srun -c 32 --cores-per-socket=16 hostname
srun: job 126213 queued and waiting for resources
No longer gives me an error and blocks waiting for a node with that
config to become available.
If I put that into our debug partition then it picks one of our KNL
nodes (our default is the Skylake node partition) which is available.
[csamuel@farnarkle1 pt2pt]$ srun -p debug -c 32 --cores-per-socket=16
hostname
srun: job 126214 queued and waiting for resources
srun: job 126214 has been allocated resources
gina4
Very much appreciated..
All the best,
Chris
Hi That's great. We're gonna go ahead and mark this as resolved/fixed. Enjoy your vacation. Dominik |