Ticket 4244

Summary:	srun for gres=gpu reports "Requested node configuration is not available"
Product:	Slurm	Reporter:	UAB Research Computing <RC_LICENSES>
Component:	Configuration	Assignee:	Alejandro Sanchez <alex>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	avalonjo, da
Version:	17.02.7
Hardware:	Linux
OS:	Linux
Site:	UAB	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	17.02.8 17.11.0pre3
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---
Attachments:	slurm.conf gres.conf

Description UAB Research Computing 2017-10-10 12:22:08 MDT

I have configured slurm.conf and gres.conf with GPU information. The slurmctld.log with gres debug on shows the nodes are seeing and reporting the gpu resources.  But when trying to run an srun --gres=gpu:1 (or any other combination), I get get error "Requested node configuration is not available"

[jpr@login002 gputest-cluster]$ srun --gres=gpu hostname                                                                                      
srun: error: Unable to allocate resources: Requested node configuration is not available
srun: Force Terminated job 40


The slurmctld.log shows:'

[2017-10-10T13:15:17.379] debug:  sched: Running job scheduler
[2017-10-10T13:15:17.717] gres: gpu state for job 40
[2017-10-10T13:15:17.717]   gres_cnt:1 node_cnt:0 type:(null)
[2017-10-10T13:15:17.717] _pick_best_nodes: job 40 never runnable in partition normal
[2017-10-10T13:15:17.718] _slurm_rpc_allocate_resources: Requested node configuration is not available 


This seems to be an exact duplicate of the issue reported in bug #4232

https://bugs.schedmd.com/show_bug.cgi?id=4232

Comment 1 UAB Research Computing 2017-10-10 12:26:19 MDT

Created attachment 5356 [details]
slurm.conf

Comment 2 UAB Research Computing 2017-10-10 12:27:13 MDT

Created attachment 5357 [details]
gres.conf

Comment 3 UAB Research Computing 2017-10-10 14:58:41 MDT

Interestingly, I ran a similar config on a 16.05.9 test box and it works just fine.

I enabled GresType=gpu and set one of the nodes to have a Gres=gpu:tty:4 resource in slurm.conf.

On the target node, my gres.conf contains:

Name=gpu Type=tty File=/dev/tty[0-3]

After putting the config in place, I can srun and request the resources as expected:

jpr@oakmnt:~/projects/slurm$ sinfo -o "%N %G"                                                                                                 
NODELIST GRES
oakcompute[0-5] (null)
oakcompute6 gpu:tty:4

jpr@oakmnt:~/projects/slurm$ srun --gres=gpu ./test-gpu.sh                                                                                    
0
jpr@oakmnt:~/projects/slurm$ srun --gres=gpu:tty ./test-gpu.sh                                                                                
0
jpr@oakmnt:~/projects/slurm$ srun --gres=gpu:tty:2 ./test-gpu.sh                                                                              
0,1
jpr@oakmnt:~/projects/slurm$ srun --gres=gpu:tty:4 ./test-gpu.sh                                                                              
0,1,2,3

My test-gpu.sh just echos CUDA_VISIBLE_DEVICES.

Comment 4 Isaac Hartung 2017-10-10 15:47:12 MDT

We are currently looking into this and will keep you updated.

--Isaac

Comment 5 UAB Research Computing 2017-10-11 14:02:20 MDT

It appears on the initial cluster test that CPU affinities were causing the problem. Removing the CPUs=0 and CPUs=1 from the gres.conf lines caused the gpu resource allocation to succeed.

The second test cluster which works with and without the CPUs lines in the test gres.conf file does have a slightly more enhanced slurm.conf with Proctrack set to linuxproc, cons_res active, a slurm accounting db, and jobacct_gather/linux.  None of these settings are suggested as required by the https://slurm.schedmd.com/gres.html docs page.

Comment 6 UAB Research Computing 2017-10-11 14:22:23 MDT

One further point, it seems that this initial basic slurm.conf that I attached also doesn't support using the type parameter.  The only gres.conf that works for me right now is:

Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia1
Name=gpu File=/dev/nvidia2
Name=gpu File=/dev/nvidia3

Not sure what aspect of the slurm.conf enables the Type parameter and CPUs affinities.

Both work on my more advanced slurm.conf described in the previous comment.

Comment 9 UAB Research Computing 2017-10-12 19:48:13 MDT

So I individually added each feature missing from my simpler slurm.conf where the Type parameter for gpus was failing (slurmdbd  accounting, accounting_gather/linux, then select/cons_res).  It was adding cons_res that finally allowed the Type parameter to function:

 SelectType=select/cons_res
 SelectTypeParameters=CR_Core

now the sruns work as expected:

[jpr@login002 slurm-config]$ srun --gres=gpu ./test-gpu.sh                                                                                    
0
[jpr@login002 slurm-config]$ srun --gres=gpu:p100 ./test-gpu.sh
0
[jpr@login002 slurm-config]$ srun --gres=gpu:p100:2 ./test-gpu.sh
0,1
[jpr@login002 slurm-config]$ srun --gres=gpu:p100:4 ./test-gpu.sh
0,1,2,3

I don't believe this requirement is documented.

Comment 23 Alejandro Sanchez 2017-10-18 04:24:14 MDT

Hi. This has been fixed in the following commit:

https://github.com/SchedMD/slurm/commit/6ceaa49efa5d6

which will be available in the next Slurm 17.02.8 tag. Three of us have tested the patch and now select/linear can also make use of gres.conf defined lines including Type and/or CPUs options. Previously only select/cons_res would be able to accept requests with such configuration. I'm closing the bug as fixed. Please, reopen if you encounter further issues. Thanks for reporting.

Comment 29 A J 2018-02-21 18:25:16 MST

Hi 

Is this problem resolved?

We are running 17.11.3-2

And a gres.conf file in the form:

  NodeName=hpc-test06  Name=gpu Type=k20 File=/dev/nvidia0 Cores=0-15
  NodeName=hpc-test06  Name=gpu Type=k20 File=/dev/nvidia1 Cores=0-15

works

But the desired configuration:

  NodeName=hpc-test06  Name=gpu Type=k20 File=/dev/nvidia0 Cores=0-7
  NodeName=hpc-test06  Name=gpu Type=k20 File=/dev/nvidia1 Cores=8-15

does not. 

The request for a gpu:

 srun --gres=gpu:k20:1 --ntasks=2 --cpus-per-task=8   /bin/bash -c 'echo $CUDA_VISIBLE_DEVICES'

results in the error message:

  srun: error: Unable to allocate resources: Requested node configuration is not available

This appears to be related to the test:

                if (gres_cpus != NO_VAL) {
                        gres_cpus *= cpus_per_core;
                        if ((gres_cpus < cpu_cnt) ||
                            (gres_cpus < job_ptr->details->ntasks_per_node) ||
                            ((job_ptr->details->cpus_per_task > 1) &&
                             (gres_cpus < job_ptr->details->cpus_per_task))) {
                                bit_clear(jobmap, i);
                                continue;
                        }
                }

 in the function:

  _job_count_bitmap

in the file:

   src/plugins/select/linear/select_linear.c

Specifically the test:

                        if ((gres_cpus < cpu_cnt) 

Which is set by:

                gres_cores = gres_plugin_job_test(job_ptr->gres_list,
                                                  gres_list, use_total_gres,
                                                  NULL, core_start_bit,
                                                  core_end_bit, job_ptr->job_id,
                                                  node_ptr->name);
                gres_cpus = gres_cores;

gres_cpus is set to 16 for our configs only if the config parameter is Cores=0-15 

Is there some other configuration parameter that I should have set that would have changed this behavior

Thanks

Comment 30 A J 2018-02-21 18:27:36 MST

Also I neglected to mention that we are using:

SelectType=select/linear
SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_Memory

Comment 33 Danny Auble 2018-02-21 18:49:20 MST

AJ, would you mind please submitting a new ticket? I don't want to hijack someone else's ticket.

Comment 34 A J 2018-02-22 13:47:09 MST

Thanks

I have submitted a new bug it is Bug 4827

Avalon