Ticket 16264

Summary: Extend NodeAddr to support ranges
Product: Slurm Reporter: Chris Samuel (NERSC) <csamuel>
Component: ConfigurationAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: dmjacobsen, durairaa, heasterday, kauffman, kevin.buckley, kilian, marshall, mcoyne, rcwhite, sts, tim
Version: 22.05.8   
Hardware: Linux   
OS: Linux   
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 23.11.0rc1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Chris Samuel (NERSC) 2023-03-13 23:04:40 MDT
Hi there,

Currently NodeAddr has to specified a host at a time, which for us results in very long lines that look like this (taken from a test system):

> NodeName=nid[001000-001001,001004-001005,001008-001009,001012-001013,001016-001017,001020-001021,001024-001025,001028-001029,001032-001033,001036-001037,001040-001041,001044-001045,001048-001049,001052-001053,001056-001057,001060-001061] CPUs=128 Boards=1 SocketsPerBoard=4 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257290 MemSpecLimit=27298 Feature=gpu,ss11,a100,hbm40g Gres=gpu:a100:4 NodeAddr=nid001000-hsn0,nid001001-hsn0,nid001004-hsn0,nid001005-hsn0,nid001008-hsn0,nid001009-hsn0,nid001012-hsn0,nid001013-hsn0,nid001016-hsn0,nid001017-hsn0,nid001020-hsn0,nid001021-hsn0,nid001024-hsn0,nid001025-hsn0,nid001028-hsn0,nid001029-hsn0,nid001032-hsn0,nid001033-hsn0,nid001036-hsn0,nid001037-hsn0,nid001040-hsn0,nid001041-hsn0,nid001044-hsn0,nid001045-hsn0,nid001048-hsn0,nid001049-hsn0,nid001052-hsn0,nid001053-hsn0,nid001056-hsn0,nid001057-hsn0,nid001060-hsn0,nid001061-hsn0 Weight=1000

This came up as a digression in bug#15322 and Tim M channelled Tim W to suggest that instead of a prefix extending the NodeAddr to support a range would result in something nicer.

So for us that would turn the above into:

NodeName=nid[001000-001001,001004-001005,001008-001009,001012-001013,001016-001017,001020-001021,001024-001025,001028-001029,001032-001033,001036-001037,001040-001041,001044-001045,001048-001049,001052-001053,001056-001057,001060-001061] CPUs=128 Boards=1 SocketsPerBoard=4 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=257290 MemSpecLimit=27298 Feature=gpu,ss11,a100,hbm40g Gres=gpu:a100:4 NodeAddr=nid[001000-001001,001004-001005,001008-001009,001012-001013,001016-001017,001020-001021,001024-001025,001028-001029,001032-001033,001036-001037,001040-001041,001044-001045,001048-001049,001052-001053,001056-001057,001060-001061]-hsn0,nid001061-hsn0 Weight=1000

Which is more manageable, and especially for Perlmutter where our CPU nodes are defined as nid[004174-007143] which results in 2970 separate entries on one line for NodeAddr at present.

All the best,
Chris
Comment 3 Dominik Bartkiewicz 2023-04-19 08:16:55 MDT
Hi

Sorry for the late response. We have a patch that implements this feature, but we still need to internally discuss some details. I will let you know when we push this to the repo.

Dominik
Comment 5 Jason Booth 2023-06-08 12:48:15 MDT
*** Ticket 16927 has been marked as a duplicate of this ticket. ***
Comment 8 Marshall Garey 2023-06-15 15:06:52 MDT
*** Ticket 16927 has been marked as a duplicate of this ticket. ***
Comment 12 Marshall Garey 2023-07-05 09:25:06 MDT
*** Ticket 17109 has been marked as a duplicate of this ticket. ***
Comment 17 Dominik Bartkiewicz 2023-07-20 03:20:34 MDT
Hi

In 23.11 we added support for arbitrary suffixes to hostlist (eg.: aaa[1-100]-bbb, aaa[1-100]-b0).

Internally this will behave similarly as coma separate list of such hosts or two range hostlist with one entry in the last range (aaa[1-100]-b[0]).
This means such hostlist will not be printed in reduced format after parsing.

Dominik
Comment 18 Chris Samuel (NERSC) 2023-08-01 23:45:51 MDT
Hi Dominik,

Thanks so much for that, sounds great!

All the best,
Chris
Comment 19 Dominik Bartkiewicz 2023-08-04 04:49:27 MDT
Hi

I'll go ahead and close this ticket.
If anything else comes up feel free to reopen the ticket.

Dominik
Comment 20 Jason Booth 2023-09-04 09:09:01 MDT
*** Ticket 17590 has been marked as a duplicate of this ticket. ***