Ticket 4995

Summary:	Specifying --cores-per-socket prevents using more cores than are on a single socket
Product:	Slurm	Reporter:	Christopher Samuel <chris>
Component:	User Commands	Assignee:	Dominik Bartkiewicz <bart>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	bart
Version:	17.11.5
Hardware:	Linux
OS:	Linux
Site:	Swinburne	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	17.11.6
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf

Description Christopher Samuel 2018-03-27 19:00:37 MDT

Hi there,

The bulk of our cluster is comprised of dual socket, 18 core Skylake nodes with dual GPUs.  As we reserve 4 cores per node for GPU jobs only we effectively have 16 cores per socket available for non-GPU jobs.

In an effort to make Slurm allocate 16 cores per socket to all non-GPU jobs (rather than use all 18 cores on one socket and 14 on the other, thus meaning all the GPU specific cores are on one socket) I was trying to run a job with:

$ srun -c 32 --cores-per-socket=16 hostname
srun: error: Unable to allocate resources: Requested node configuration is not available

This is wrong, as the nodes definitely have more than 16 cores per socket available.   Testing further shows that this works:

$ sinteractive -c 18 --cores-per-socket=16
srun: job 51995 queued and waiting for resources
srun: job 51995 has been allocated resources
$ nproc
18

But this doesn't:

$ sinteractive -c 19 --cores-per-socket=16
srun: Force Terminated job 51994
srun: error: Unable to allocate resources: Requested node configuration is not available

But this does work:

[csamuel@farnarkle1 tmp]$ srun -c 19 nproc
srun: job 51998 queued and waiting for resources
srun: job 51998 has been allocated resources
19

As far as I can tell that's a bug, there's no reason Slurm shouldn't be able to accept those jobs as the nodes do have more than the requested 16 cores, and even odder that the failure is when you exceed the actual number of cores per socket, rather than the requested number.

Interestingly this might be a regression since 16.05.x as a cluster I have access to (2 sockets, 16 cores a node) with that version does not show that issue.

$ srun --version
slurm 16.05.8

$ srun -c 19 nproc
srun: job 3805919 queued and waiting for resources
srun: job 3805919 has been allocated resources
19

$ srun -c 19 --cores-per-socket=16 nproc
srun: job 3805920 queued and waiting for resources
srun: job 3805920 has been allocated resources
19

Another cluster with 17.02.9 on also works as expected, so the regression is after that version.

I'll open a separate bug about getting jobs allocated 16 cores per socket,
rather than only selecting nodes with at least that number of cores.

All the best,
Chris

Comment 1 Dominik Bartkiewicz 2018-03-29 10:05:43 MDT

Hi

Could you send me current slurm.conf and your sinteractive script. 

Dominik

Comment 2 Christopher Samuel 2018-03-29 22:51:17 MDT

Created attachment 6507 [details]
slurm.conf

Hi Dominik,

On Friday, 30 March 2018 3:05:43 AM AEDT you wrote:

> Could you send me current slurm.conf and your sinteractive script.

sinteractive is just: 

#!/bin/bash
exec srun $* --pty -u ${SHELL} -i -l

I've attached our slurm.conf

All the best,
Chris

Comment 3 Dominik Bartkiewicz 2018-03-30 04:38:09 MDT

Hi

Thanks, I can reproduce this now.
This behavior is effect of interaction '--cores-per-socket' with MaxCPUsPerNode.

Dominik

Comment 4 Christopher Samuel 2018-03-30 06:50:28 MDT

Hi Dominik,

On Friday, 30 March 2018 9:38:09 PM AEDT bugs@schedmd.com wrote:

> Thanks, I can reproduce this now.
> This behavior is effect of interaction '--cores-per-socket' with
> MaxCPUsPerNode.

Ah yes, I hadn't thought to test it on the other partitions we have and yes, 
that works. 

So definitely a bug then?

cheers!
Chris

Comment 5 Dominik Bartkiewicz 2018-03-30 07:51:00 MDT

Hi

I think this is a bug. 
I don’t know yet how to fix this with current MaxCPUsPerNode implementation.
From my observation this is not working on 16.05 too, are you observing the same thing?

Dominik

Comment 8 Christopher Samuel 2018-03-30 16:15:23 MDT

Hiya,

On Saturday, 31 March 2018 12:51:00 AM AEDT you wrote:

> I think this is a bug.

Yeah, looks like it to me too.

> I don’t know yet how to fix this with current MaxCPUsPerNode implementation.
> From my observation this is not working on 16.05 too, are you observing the
> same thing?

This system was brought up with 17.11.0 on it, the other systems I mentioned 
are just ones I have access to as a user and they don't have MaxCPUsPerNode 
set on any partitions.

All the best,
Chris

Comment 18 Dominik Bartkiewicz 2018-05-09 08:51:53 MDT

Hi

This patch solved this issue https://github.com/SchedMD/slurm/commit/6de8c831ae9c388.
It will be in 17.11.6
Let me know if this works on your hardware exactly as you expected.

Dominik

Comment 19 Christopher Samuel 2018-05-09 18:09:16 MDT

On 10/05/18 00:51, bugs@schedmd.com wrote:

> Hi

Hi Dominik,

> This patch solved this issue 
> https://github.com/SchedMD/slurm/commit/6de8c831ae9c388. It will be
> in 17.11.6 Let me know if this works on your hardware exactly as you
> expected.

Thanks so much for this, this is my last day in the office before I head
out on leave for 3 weeks so I'll try and get it done today (have some
meetings too).

If I can't I'll let you know in June when I'm back in Australia.

All the best!
Chris

Comment 20 Christopher Samuel 2018-05-09 20:21:43 MDT

On 10/05/18 10:08, Christopher Samuel wrote:

> Thanks so much for this, this is my last day in the office before I
> head out on leave for 3 weeks so I'll try and get it done today (have
> some meetings too).

Tested and looking good, thank you!

[csamuel@farnarkle1 pt2pt]$ srun -c 32 --cores-per-socket=16 hostname
srun: job 126213 queued and waiting for resources

No longer gives me an error and blocks waiting for a node with that 
config to become available.

If I put that into our debug partition then it picks one of our KNL 
nodes (our default is the Skylake node partition) which is available.

[csamuel@farnarkle1 pt2pt]$ srun -p debug -c 32 --cores-per-socket=16 
hostname
srun: job 126214 queued and waiting for resources
srun: job 126214 has been allocated resources
gina4

Very much appreciated..

All the best,
Chris

Comment 21 Dominik Bartkiewicz 2018-05-10 03:39:42 MDT

Hi

That's great.
We're gonna go ahead and mark this as resolved/fixed.
Enjoy your vacation.

Dominik