1523 – node core allocation hops between physical cpu cores

Ticket 1523 - node core allocation hops between physical cpu cores

Summary: node core allocation hops between physical cpu cores

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Configuration (show other tickets)
Version:	15.08.x
Hardware:	Linux Linux

Severity:	6 - No support contract
Assignee:	Brian Christiansen
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2015-03-11 11:45 MDT by David Kroher
Modified:	2015-03-18 10:47 MDT (History)
CC List:	3 users (show)

See Also:
Site:	SGI
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
current slurm.conf (1.28 KB, text/plain) 2015-03-11 11:45 MDT, David Kroher	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description David Kroher 2015-03-11 11:45:38 MDT

Created attachment 1709 [details]
current slurm.conf

Hello,

Using slurm-15.08.0-0pre2.el6.x86_64. Is there a way I can set the allocated cores to be more serial? It seems that slurm currently allocates cores between each cpu to fill up a node hoping between each physical cpu until the node cores are all used.

For example, using srun:

[dmk@gambit collective]$ srun -n 64 osu_allgather H H
MPT DSM information
MPT MPI_DSM_DISTRIBUTE enabled
grank	lrank	pinning	 node name   	cpuid
    0	    0	no	 r1i0n0      	    0
    1	    1	no	 r1i0n0      	   16
    2	    2	no	 r1i0n0      	    1
    3	    3	no	 r1i0n0      	   17
    4	    4	no	 r1i0n0      	    2
    5	    5	no	 r1i0n0      	   18
    6	    6	no	 r1i0n0      	    3
    7	    7	no	 r1i0n0      	   19
    8	    8	no	 r1i0n0      	    4
    9	    9	no	 r1i0n0      	   20
   10	   10	no	 r1i0n0      	    5

I would like to be able to run a job such as this, so that cores are selected serially.

[dmk@gambit collective]$ mpirun r1i0n0,r1i0n1 -np 32 ./osu_allgather H H
MPT DSM information
MPT MPI_DSM_DISTRIBUTE enabled
grank	lrank	pinning	 node name   	cpuid
    0	    0	yes	 r1i0n0      	    0
    1	    1	yes	 r1i0n0      	    1
    2	    2	yes	 r1i0n0      	    2
    3	    3	yes	 r1i0n0      	    3
    4	    4	yes	 r1i0n0      	    4
    5	    5	yes	 r1i0n0      	    5
    6	    6	yes	 r1i0n0      	    6
    7	    7	yes	 r1i0n0      	    7
    8	    8	yes	 r1i0n0      	    8
    9	    9	yes	 r1i0n0      	    9
   10	   10	yes	 r1i0n0      	   10


Thanks!
dmk

Comment 1 Brian Christiansen 2015-03-12 03:06:58 MDT

Will you try srun -m block:block. And you may also try adding CR_Pack_Nodes to your SelectTypeParameters. Let me know how these work for you.

Thanks,
Brian

Comment 2 David Kroher 2015-03-12 04:37:00 MDT

Ah perfect. the srun -m block:block lined the core allocation properly. I wasn't able to view a difference with the CR_Pack_Nodes.

Is there an environment variable I can set to make -m block:block the default?

Thank you!

dmk

Comment 3 Brian Christiansen 2015-03-12 05:22:58 MDT

Checkout the SLURM_DISTRIBUTION env variable. You can see the available env variables on the srunan page.

Comment 4 David Kroher 2015-03-12 05:44:28 MDT

Excellent, Thank you!

Comment 5 Brian Christiansen 2015-03-12 06:06:18 MDT

Glad to help.

Comment 6 David Kroher 2015-03-12 06:42:15 MDT

Hi Brian, for consistencies sake, is there a way to get the core allocation to start from 0? It seems slurm now allocates cores on a node from the highest core count to the lowest.

[dmk@gambit collective]$ srun -n 10 osu_allgather H H
MPT DSM information
MPT MPI_DSM_DISTRIBUTE enabled
grank	lrank	pinning	 node name   	cpuid
    0	    0	no	 r1i0n0      	   22
    1	    1	no	 r1i0n0      	   23
    2	    2	no	 r1i0n0      	   24
    3	    3	no	 r1i0n0      	   25
    4	    4	no	 r1i0n0      	   26
    5	    5	no	 r1i0n0      	   27
    6	    6	no	 r1i0n0      	   28
    7	    7	no	 r1i0n0      	   29
    8	    8	no	 r1i0n0      	   30
    9	    9	no	 r1i0n0      	   31

Comment 7 Brian Christiansen 2015-03-17 08:50:59 MDT

I've reopened the ticket while I research why block:block allocates backwards. I'll let you know what I find.

Comment 8 Martin Perry 2015-03-18 10:44:59 MDT

Hi guys,

I wondered about this myself when I was working on the code in select/cons_res. It seems to be a decision by the original cons_res developers (HP). There is a cryptic comment in _block_sync_core_bitmap that says "search for the best socket starting from the last one to let more room in the first one for system usage." Maybe there is some performance advantage from trying to leave lower-numbered sockets/cores/cpus for system processes???

To answer David's question, I don't think there is a way to request a lowest-to-highest ordering for block allocation.

Martin

Comment 9 Brian Christiansen 2015-03-18 10:47:39 MDT

This was just fixed in:

https://github.com/SchedMD/slurm/commit/f43992a87978be9646536718b767e7c70596cc0d