Ticket 14688

Summary: slurmd: error: Ignoring gres.conf record, invalid name: shard
Product: Slurm Reporter: 1ck_5bhkurvhpmdz
Component: ConfigurationAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 22.05.2   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description 1ck_5bhkurvhpmdz 2022-08-05 01:17:06 MDT
slurm doesn't recognise the newly introduced gres "shard" as a valid resource name.

It doesn't matter what type of configuration I use from the three options given here https://slurm.schedmd.com/gres.html#Sharding

If it is only configured in slurm.conf, it is quietly ignored, no error thrown, and shard doesn't show up in scontrol show node as a resource.

If it is also configured in gres.conf, slurmd throws errors (per line of shard configuration):
slurmd: error: Ignoring gres.conf record, invalid name: shard

Doesn't matter if the scheme is "Name=shard Count=x" or "Name=shard Count=x File=y".

This is a test node with slurmctld and slurmd on version 22.05.2, Nvidia A series, driver 515 and cuda 11.7. The NVML autodetect is working.