9687 – Considering NUMA Node Distance in Scheduling

Ticket 9687 - Considering NUMA Node Distance in Scheduling

Summary: Considering NUMA Node Distance in Scheduling

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Scheduling (show other tickets)
Version:	19.05.7
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	Unassigned Developer
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2020-08-27 14:36 MDT by Steve Ford
Modified:	2020-08-28 17:01 MDT (History)
CC List:	1 user (show)

See Also:
Site:	MSU
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Steve Ford 2020-08-27 14:36:47 MDT

Hello SchedMD,

We are deploying new nodes, each with two AMD EPYC 7H12 processors. These nodes have a total of eight NUMA nodes, four on each processor, which SLURM sees these as 8 sockets:

# slurmd -C
CPUs=128 Boards=1 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=1 RealMemory=515461

# numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 63930 MB
node 0 free: 3576 MB
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 64508 MB
node 1 free: 4531 MB
node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
node 2 size: 64508 MB
node 2 free: 4548 MB
node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 3 size: 64496 MB
node 3 free: 4662 MB
node 4 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
node 4 size: 64508 MB
node 4 free: 4746 MB
node 5 cpus: 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 5 size: 64508 MB
node 5 free: 4766 MB
node 6 cpus: 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
node 6 size: 64492 MB
node 6 free: 4115 MB
node 7 cpus: 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 7 size: 64508 MB
node 7 free: 4400 MB
node distances:
node   0   1   2   3   4   5   6   7 
  0:  10  12  12  12  32  32  32  32 
  1:  12  10  12  12  32  32  32  32 
  2:  12  12  10  12  32  32  32  32 
  3:  12  12  12  10  32  32  32  32 
  4:  32  32  32  32  10  12  12  12 
  5:  32  32  32  32  12  10  12  12 
  6:  32  32  32  32  12  12  10  12 
  7:  32  32  32  32  12  12  12  10

Can SLURM consider variations in distance between NUMA nodes when scheduling jobs that span multiple NUMA nodes?

Thanks,
Steve

Comment 2 Jason Booth 2020-08-28 17:01:15 MDT

Hi Steve - I had a few engineers on my side look into this bug which has turned out to be a feature request requiring a large development effort. I have moved this over to a severity 5 for future consideration.