Ticket 10569

Summary: TresPerNode limited to one per GPU with cons_tres plugin
Product: Slurm Reporter: Spencer Bliven <spencer.bliven>
Component: GPUAssignee: Director of Support <support>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: felip.moll, spencer.bliven
Version: 20.02.6   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=9716
https://bugs.schedmd.com/show_bug.cgi?id=9636
https://bugs.schedmd.com/show_bug.cgi?id=9346
Site: Paul Scherrer Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: RHEL Machine Name:
CLE Version: Version Fixed: 20.11.3
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm.conf
gres.conf

Description Spencer Bliven 2021-01-06 03:08:20 MST
We are using cons_tres to manage GPU allocation. We also would like to use gres to specify per-node features (specifically available scratch and video memory). These are defined in gres.conf:

  NodeName=merlin-g-001 Name=gpu Type=GTX1080 File=/dev/nvidia[0,1] Cores=0-15
  NodeName=merlin-g-001 Name=scratch Count=512

And slurm.conf:


  GresTypes=gpu,scratch
  AccountingStorageTRES=gres/scratch,gres/gpu,gres/gpu:GTX1080,gres/gpu:GTX1080Ti,ic/ofed
  NodeName=merlin-g-001 Weight=1 CPUs=16 RealMemory=128000 MemSpecLimit=25600 Sockets=1 CoresPerSocket=8 ThreadsPerCore=2 Gres=gpu:GTX1080:2,scratch:512 State=UNKNOWN


However, slurm is unable to allocate resources when specifying both --gpus and --gres. Example:

  $ sbatch --partition=gpu --gres=scratch:500 --gpus=1 --wrap 'sleep 10'
  sbatch: error: Batch job submission failed: Requested node configuration is not available

Oddly, it does find a configuration if the gres is less than or equal to the number of GPUs:

  $ sbatch --partition=gpu --gres=scratch:2,gpu:2 --gpus=2 --wrap 'sleep 10'
  Submitted batch job 9551

It also works if we specify the number of gpus using --gres instead of --gpus (resulting in a per-node constraint as opposed to a per-job constraint):

  $ sbatch --partition=gpu --gres=scratch:500,gpu:1 --wrap 'sleep 10'
  Submitted batch job 9552

Is this a known issue with how --gpus interacts with --gres, or is there an error in our configuration?

Part of the motivation for this related to implementing #9346 using gres instead of features. Since features are binary, specifying the video memory would require creating a feature for each class of graphics card (8G_GPU, 11G_GPU, etc). It would be much cleaner to specify this as an integer (--gres=gpumem:11G).
Comment 1 Spencer Bliven 2021-01-06 03:10:47 MST
Created attachment 17358 [details]
slurm.conf
Comment 2 Spencer Bliven 2021-01-06 03:11:04 MST
Created attachment 17359 [details]
gres.conf
Comment 4 Michael Hinton 2021-01-07 10:12:03 MST
Hi Spencer,

I am able to reproduce this and will see what I find.

Thanks,
-Michael
Comment 8 Michael Hinton 2021-01-08 18:03:21 MST
Hi Spencer,

We found the issue and are currently reviewing a patch that will fix it. Thanks for the report!

Unfortunately, the fix will only land in 20.11, since that is the latest release and this isn't a security bug. Once it does land, though, I can attach a version of the patch for 20.02 in Bugzilla that you can apply yourself. Or, you can simply cherry-pick the 20.11 commits directly from GitHub, as I believe they should also apply cleanly on top of 20.02.

Thanks,
-Michael
Comment 9 Spencer Bliven 2021-01-11 03:33:38 MST
Great, thanks for the fix!
Comment 17 Michael Hinton 2021-01-14 17:01:10 MST
(In reply to Spencer Bliven from comment #9)
> Great, thanks for the fix!
This has been fixed with commit 383808e7 and will be included in 20.11.3. See https://github.com/SchedMD/slurm/commit/383808e724baa71be58da54c2ab299b8a54d2bd4.

Thanks!
-Michael
Comment 20 Michael Hinton 2021-01-15 12:16:38 MST
Marking this as fixed in 20.11.3 and closing out.