Ticket 14688 - slurmd: error: Ignoring gres.conf record, invalid name: shard
Summary: slurmd: error: Ignoring gres.conf record, invalid name: shard
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 22.05.2
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-08-05 01:17 MDT by 1ck_5bhkurvhpmdz
Modified: 2022-08-05 01:17 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description 1ck_5bhkurvhpmdz 2022-08-05 01:17:06 MDT
slurm doesn't recognise the newly introduced gres "shard" as a valid resource name.

It doesn't matter what type of configuration I use from the three options given here https://slurm.schedmd.com/gres.html#Sharding

If it is only configured in slurm.conf, it is quietly ignored, no error thrown, and shard doesn't show up in scontrol show node as a resource.

If it is also configured in gres.conf, slurmd throws errors (per line of shard configuration):
slurmd: error: Ignoring gres.conf record, invalid name: shard

Doesn't matter if the scheme is "Name=shard Count=x" or "Name=shard Count=x File=y".

This is a test node with slurmctld and slurmd on version 22.05.2, Nvidia A series, driver 515 and cuda 11.7. The NVML autodetect is working.