Ticket 3884

Summary: Erroneous "Requested node configuration is not available"
Product: Slurm Reporter: Nicholas McCollum <nmccollum>
Component: SchedulingAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: bart
Version: 15.08.10   
Hardware: Linux   
OS: Linux   
Site: ASC Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: Our slurm.conf
Our job_submit.lua

Description Nicholas McCollum 2017-06-09 10:33:02 MDT
Created attachment 4737 [details]
Our slurm.conf

I'm getting an erroneous "Requested node configuration is not available" when submitting requesting resources I know are available.

$ sbatch -vvvvvvvv --qos=class_gpu -t 01:00:00 --gres=gpu:2 -N 1-1 -n 10 --mem-per-cpu=1000mb ls_test
sbatch: defined options for program `sbatch'
sbatch: ----------------- ---------------------
sbatch: user              : `asnnam'
sbatch: uid               : 2573
sbatch: gid               : 10000
sbatch: cwd               : /mnt/homeapps/home/asnnam
sbatch: ntasks            : 10 (set)
sbatch: nodes             : 1-1
sbatch: jobid             : 4294967294 (default)
sbatch: partition         : default
sbatch: profile           : `NotSet'
sbatch: job name          : `ls_test'
sbatch: reservation       : `(null)'
sbatch: wckey             : `(null)'
sbatch: distribution      : unknown
sbatch: verbose           : 8
sbatch: immediate         : false
sbatch: overcommit        : false
sbatch: time_limit        : 60
sbatch: nice              : -2
sbatch: account           : (null)
sbatch: comment           : (null)
sbatch: dependency        : (null)
sbatch: gres              : gpu:2
sbatch: qos               : class_gpu
sbatch: constraints       : mem-per-cpu=1000M 
sbatch: geometry          : (null)
sbatch: reboot            : yes
sbatch: rotate            : no
sbatch: network           : (null)
sbatch: array             : N/A
sbatch: cpu_freq_min      : 4294967294
sbatch: cpu_freq_max      : 4294967294
sbatch: cpu_freq_gov      : 4294967294
sbatch: mail_type         : NONE
sbatch: mail_user         : (null)
sbatch: sockets-per-node  : -2
sbatch: cores-per-socket  : -2
sbatch: threads-per-core  : -2
sbatch: ntasks-per-node   : 0
sbatch: ntasks-per-socket : -2
sbatch: ntasks-per-core   : -2
sbatch: mem_bind          : default
sbatch: plane_size        : 4294967294
sbatch: propagate         : NONE
sbatch: switches          : -1
sbatch: wait-for-switches : -1
sbatch: core-spec         : NA
sbatch: burst_buffer      : `(null)'
sbatch: remote command    : `/mnt/homeapps/home/asnnam/ls_test'
sbatch: power             : 
sbatch: sicp              : 0
sbatch: debug:  propagating SLURM_PRIO_PROCESS=0
sbatch: debug:  propagating UMASK=0022
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/auth_munge.so
sbatch: debug:  auth plugin for Munge (http://code.google.com/p/munge/) loaded
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_alps.so
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_bluegene.so
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_cons_res.so
sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 17
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_cray.so
sbatch: Cray node selection plugin loaded
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_linear.so
sbatch: Linear node selection plugin loaded with argument 17
sbatch: debug3: Success.
sbatch: debug3: Trying to load plugin /usr/lib64/slurm/select_serial.so
sbatch: Serial Job Resource Selection plugin loaded with argument 17
sbatch: debug3: Success.
sbatch: error: Batch job submission failed: Requested node configuration is not available

Class GPU QOS contains:
$ sacctmgr list qos class_gpu format=name,MaxTRES%20,MaxTRESPerNode,Flags%45,MaxWall
      Name              MaxTRES MaxTRESPerNode                                         Flags     MaxWall 
---------- -------------------- -------------- --------------------------------------------- ----------- 
 class_gpu     cpu=10,mem=30720                   DenyOnLimit,OverPartQOS,PartitionTimeLimit    02:00:00

The reservation is:
$ scontrol show res class_gpu
ReservationName=class_gpu StartTime=2016-10-10T10:15:30 EndTime=2017-10-10T10:15:30 Duration=365-00:00:00
   Nodes=dmc[1,4] NodeCnt=2 CoreCnt=40 Features=(null) PartitionName=(null) Flags=IGNORE_JOBS,SPEC_NODES
   TRES=cpu=40
   Users=(null) Accounts=class Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a

There is logic in the job_submit.lua to set some variables for users, such as specifying Account=class for Class* QoS jobs as well as setting the partition to gpu_kepler.

$ scontrol show part gpu_kepler
PartitionName=gpu_kepler
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=15-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=dmc[1-4]
   Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
   State=UP TotalCPUs=80 TotalNodes=4 SelectTypeParameters=N/A
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

As a note, I am aware of the CVE issue (we are not using prolog) and we are planning up upgrading to the latest available version of SLURM in August.
Comment 1 Nicholas McCollum 2017-06-09 10:33:53 MDT
Created attachment 4738 [details]
Our job_submit.lua

Attached job_submit.lua
Comment 2 Nicholas McCollum 2017-06-09 10:35:05 MDT
Below is the gres.conf for the nodes in that reservation: dmc[1,4]

[root@dmc1 ~]# cat /etc/slurm/gres.conf 
name=gpu type=kepler file=/dev/nvidia0
name=gpu type=kepler file=/dev/nvidia1
name=gpu type=kepler file=/dev/nvidia2
name=gpu type=kepler file=/dev/nvidia3
Comment 4 Tim Wickberg 2017-06-09 14:14:13 MDT
I'm going to ask Dominik to take a look at this on Monday - he's much more versed on these allocation issue than I am.

In the meantime, are you able to attach the slurmctld log from around when that job is submitted? There should be some extra hints in there as to what's going on that may make reproducing this a bit easier on our side.

- Tim
Comment 6 Nicholas McCollum 2017-06-12 10:25:04 MDT
Sure, this issue is easy to recreate.

Are there any debug flags you would like for me to include?
Comment 7 Dominik Bartkiewicz 2017-06-12 14:39:22 MDT
Hi

Normal log should be sufficient.

Dominik
Comment 8 Dominik Bartkiewicz 2017-06-20 04:35:35 MDT
Hi

Do you have any updates?

Dominik
Comment 9 Nicholas McCollum 2017-06-21 12:24:16 MDT
[2017-06-21T13:23:28.336] debug2: Processing RPC: REQUEST_SUBMIT_BATCH_JOB from uid=2573
[2017-06-21T13:23:28.341] debug3: JobDesc: user_id=2573 job_id=N/A partition=gpu_kepler name=ls_test
[2017-06-21T13:23:28.341] debug3:    cpus=10-4294967294 pn_min_cpus=-1 core_spec=-1
[2017-06-21T13:23:28.341] debug3:    Nodes=1-[1] Sock/Node=65534 Core/Sock=65534 Thread/Core=65534
[2017-06-21T13:23:28.341] debug3:    pn_min_memory_cpu=1000 pn_min_tmp_disk=-1
[2017-06-21T13:23:28.341] debug3:    immediate=0 features=(null) reservation=class_gpu
[2017-06-21T13:23:28.341] debug3:    req_nodes=(null) exc_nodes=(null) gres=gpu:2
[2017-06-21T13:23:28.341] debug3:    time_limit=60-60 priority=-1 contiguous=0 shared=-1
[2017-06-21T13:23:28.342] debug3:    kill_on_node_fail=-1 script=#!/bin/sh
#
# script to do some checks o...
[2017-06-21T13:23:28.342] debug3:    argv="/mnt/homeapps/home/asnnam/ls_test"
[2017-06-21T13:23:28.342] debug3:    environment=REMOTEHOST=nyx.asc.edu,MANPATH=/apps/dmc/apps/lmod_rhel/lmod/lmod/share/man:/opt/asn/apps/lua_5.3.4/man::/usr/man,XDG_SESSION_ID=c222213,...
[2017-06-21T13:23:28.342] debug3:    stdin=/dev/null stdout=(null) stderr=(null)
[2017-06-21T13:23:28.342] debug3:    work_dir=/mnt/homeapps/home/asnnam alloc_node:sid=uv:74561
[2017-06-21T13:23:28.342] debug3:    sicp_mode=0 power_flags=
[2017-06-21T13:23:28.342] debug3:    resp_host=(null) alloc_resp_port=0 other_port=0
[2017-06-21T13:23:28.342] debug3:    dependency=(null) account=class qos=class_gpu comment=(null)
[2017-06-21T13:23:28.342] debug3:    mail_type=0 mail_user=(null) nice=0 num_tasks=10 open_mode=0 overcommit=-1 acctg_freq=(null)
[2017-06-21T13:23:28.342] debug3:    network=(null) begin=Unknown cpus_per_task=-1 requeue=-1 licenses=(null)
[2017-06-21T13:23:28.342] debug3:    end_time= signal=0@0 wait_all_nodes=-1 cpu_freq=
[2017-06-21T13:23:28.342] debug3:    ntasks_per_node=-1 ntasks_per_socket=-1 ntasks_per_core=-1
[2017-06-21T13:23:28.342] debug3:    mem_bind=65534:(null) plane_size:65534
[2017-06-21T13:23:28.342] debug3:    array_inx=(null)
[2017-06-21T13:23:28.342] debug3:    burst_buffer=(null)
[2017-06-21T13:23:28.342] debug3: _find_assoc_rec: not the right user 2573 != 1469
[2017-06-21T13:23:28.342] debug3: found correct association
[2017-06-21T13:23:28.342] debug3: found correct qos
[2017-06-21T13:23:28.342] debug3: before alteration asking for nodes 1-1 cpus 10-4294967294
[2017-06-21T13:23:28.342] debug3: after alteration asking for nodes 1-1 cpus 10-4294967294
[2017-06-21T13:23:28.369] debug2: initial priority for job 99624 is 16058
[2017-06-21T13:23:28.369] job_test_resv: job:99624 reservation:class_gpu nodes:dmc[1,4]
[2017-06-21T13:23:28.369] debug2: found 1 usable nodes from config containing dmc1
[2017-06-21T13:23:28.369] debug2: found 1 usable nodes from config containing dmc4
[2017-06-21T13:23:28.369] job_test_resv: job:99624 reservation:class_gpu nodes:dmc[1,4]
[2017-06-21T13:23:28.369] debug3: _pick_best_nodes: job 99624 idle_nodes 6 share_nodes 56
[2017-06-21T13:23:28.369] debug2: select_p_job_test for job 99624
[2017-06-21T13:23:28.369] debug2: select_p_job_test for job 99624
[2017-06-21T13:23:28.369] debug2: select_p_job_test for job 99624
[2017-06-21T13:23:28.369] _pick_best_nodes: job 99624 never runnable
[2017-06-21T13:23:28.369] debug3: powercapping: checking job 99624 : skipped, not eligible
[2017-06-21T13:23:28.369] error: slurm_jobcomp plugin context not initialized
[2017-06-21T13:23:28.369] _slurm_rpc_submit_batch_job: Requested node configuration is not available
Comment 10 Dominik Bartkiewicz 2017-06-22 08:20:10 MDT
Hi

Problem is deeper than I thought.
If it is possible could you send me slurmctld log with debug flag:
"Select Type" and gres.conf?

Dominik
Comment 11 Dominik Bartkiewicz 2017-07-06 05:35:48 MDT
Hi

Any news?

Dominik
Comment 12 Nicholas McCollum 2017-07-18 09:47:25 MDT
Sorry for the delay, we have decided to bump up to SLURM17 here in a week or so in order to get to the most current version.

Feel free to close this ticket.  If it persists in SLURM17 I will re-open a ticket.