Created attachment 16150 [details] Content of slurm.conf, gres.conf and cgroup.conf Hallo, I successively start jobs with gres=gpu:<n>, n={1, 3, 4, 2, 2, 3, 4, 1}. The running jobs are finished with "exit" and the next one is submitted (for certain combinations errors are reported). But if there is no job in the queue no errors are displayed (last example, n={4,4}). Regards, Karl-Heinz Examples [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Granted job allocation 15989 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 tmp]$ exit exit salloc: Relinquishing job allocation 15989 ____________ GPU:3 failed [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:3 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15989 gpu_4 sh yc9907 CG 0:12 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 15990 has been revoked. ____________ GPU:4 failed [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15989 gpu_4 sh yc9907 CG 0:12 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 15991 has been revoked. ___________ GPU:2 works [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:2 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15989 gpu_4 sh yc9907 CG 0:12 1 uccn490 salloc: Pending job allocation 15992 salloc: job 15992 queued and waiting for resources salloc: job 15992 has been allocated resources salloc: Granted job allocation 15992 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job ___________ GPU:2 works [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:2 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15992 gpu_4 sh yc9907 CG 6:43 1 uccn490 salloc: Pending job allocation 15993 salloc: job 15993 queued and waiting for resources salloc: job 15993 has been allocated resources salloc: Granted job allocation 15993 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 tmp]$ exit exit salloc: Relinquishing job allocation 15993 ____________ GPU:3 failed [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:3 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15993 gpu_4 sh yc9907 CG 0:46 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 15994 has been revoked. ____________ GPU:4 failed [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15993 gpu_4 sh yc9907 CG 0:46 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 15995 has been revoked. ___________ GPU:1 works [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 15993 gpu_4 sh yc9907 CG 0:46 1 uccn490 salloc: Pending job allocation 15996 salloc: job 15996 queued and waiting for resources salloc: job 15996 has been allocated resources salloc: Granted job allocation 15996 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 tmp]$ ____________________ GPU:4 queue is empty [yc9907@uccn998 tmp]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Pending job allocation 16001 salloc: job 16001 queued and waiting for resources salloc: job 16001 has been allocated resources salloc: Granted job allocation 16001 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 tmp]$ exit exit salloc: Relinquishing job allocation 16001 ____________________ GPU:4 queue is empty [yc9907@uccn998 tmp]$ sleep 20;squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Pending job allocation 16002 salloc: job 16002 queued and waiting for resources salloc: job 16002 has been allocated resources salloc: Granted job allocation 16002 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 tmp]$ exit exit salloc: Relinquishing job allocation 16002
We are looking into this. Thank you for attaching your configuration files. We will let you know if we need any additional information.
Karl-Heinz, I cannot easily reproduce the issue. Could you please share your slurmctld log from the time when it happens? Is it possible to enable SelectType debug flag before repeating the test? You can do that without restart by: scontrol setdebugflag +SelectType -> execute the commands as you did below scontrol setdebugflag -SelectType Setting the debugflag has to be issued from priviledged user. cheers, Marcin
Karl-Heinz, Could you please take a look at comment 2? cheers, Marcin
Marcin, I would like to apologize for my late feedback. We are currently having a series of maintenances. I will set the flag and get back to you afterwards. We have switched back to version 20.02.3! Regards, Karl-heinz
Created attachment 16315 [details] slurmctld.log ouput with the option setdebugflag +SelectType
Marcin, I submitted the jobs like last time. Regards, Karl-Heinz Examples Fri Oct 23-14:15:50 (40/656) root@uccn997:/home/kit/scc/yc9907# cat bug9947_setdebugflag_+SelectType [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Granted job allocation 16735 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ hostname uccn490.localdomain [yc9907@uccn490 ~]$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-661fcf0d-6789-be11-c34c-b7404bffe51a) [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16735 _________________ GPU:3 failed [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:3 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16735 gpu_4 sh yc9907 CG 1:49 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 16736 has been revoked. _________________ GPU:4 failed [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16735 gpu_4 sh yc9907 CG 1:49 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available _________________ GPU:2 works [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:2 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Granted job allocation 16738 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16738 _________________ GPU:2 works [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:2 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16738 gpu_4 sh yc9907 CG 0:57 1 uccn490 salloc: Pending job allocation 16739 salloc: job 16739 queued and waiting for resources salloc: job 16739 has been allocated resources salloc: Granted job allocation 16739 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-661fcf0d-6789-be11-c34c-b7404bffe51a) GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-f59d5aab-e029-7d6e-971e-ebdce05d3321) [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16739 _________________ GPU:3 failed [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:3 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16739 gpu_4 sh yc9907 CG 1:10 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 16740 has been revoked. _________________ GPU:4 failed [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16739 gpu_4 sh yc9907 CG 1:10 1 uccn490 salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 16741 has been revoked. _________________ GPU:1 works [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:1 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 16739 gpu_4 sh yc9907 CG 1:10 1 uccn490 salloc: Pending job allocation 16742 salloc: job 16742 queued and waiting for resources salloc: job 16742 has been allocated resources salloc: Granted job allocation 16742 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-661fcf0d-6789-be11-c34c-b7404bffe51a) [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16742 _________________ GPU:4 queue is empty [yc9907@uccn998 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Pending job allocation 16743 salloc: job 16743 queued and waiting for resources salloc: job 16743 has been allocated resources salloc: Granted job allocation 16743 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-661fcf0d-6789-be11-c34c-b7404bffe51a) GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-f59d5aab-e029-7d6e-971e-ebdce05d3321) GPU 2: Tesla V100-SXM2-32GB (UUID: GPU-b34138f7-3506-7b0a-2820-b2a954e30175) GPU 3: Tesla V100-SXM2-32GB (UUID: GPU-2606330b-f1ac-1c18-00c1-60a840c2b87a) [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16743 _________________ GPU:4 queue is empty [yc9907@uccn998 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) [yc9907@uccn998 ~]$ squeue;salloc -p gpu_4 -n 5 -t 10 --gres=gpu:4 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) salloc: Pending job allocation 16744 salloc: job 16744 queued and waiting for resources salloc: job 16744 has been allocated resources salloc: Granted job allocation 16744 salloc: Waiting for resource configuration salloc: Nodes uccn490 are ready for job [yc9907@uccn490 ~]$ exit exit salloc: Relinquishing job allocation 16744 salloc: Job allocation 16744 has been revoked. [yc9907@uccn998 ~]$ Fri Oct 23-14:16:01 (41/657)
Karl-heinz, I was able to reproduce the issue and I see where is comming from. I have a patch that should fix it. Do you want to apply it locally and verify? The patch didn't pass our QA and is not yet scheduled for release. The origin of the issue is incorrect handling of DefCpuPerGPU when specified as a default. You can workaround it setting it from cli_filter or job_submit plugin instead of specifying in slurm.conf. cheers, Marcin
Dear Mr. Stolarek, I would like to check the patch on our test cluster. But the workaround with the cli_filter or job_submit plugin would also interest me very much. I have no experience with it yet and my knowledge about it is still rudimentary. Do you have an instructive example for this case? Best regards Karl-Heinz Schmidmeier
Comment on attachment 16578 [details] fix v2 Karl-Heinz, I'm switching the patch to public mode, you should be able to download it now. The solution with job_submit plugin will require you to enable JobSubmitPlugins=lua and deploy a script like the one below in the same directory as slurm.conf: #cat /etc/slurm/job_submit.lua > 1 function _find_in_str(str, arg) > 2 if str ~= nil then > 3 return string.find(str,arg) > 4 else > 5 return false > 6 end > 7 end > 8 > 9 > 10 function slurm_job_submit(job_desc, part_list, submit_uid) > 11 if _find_in_str(job_desc.partition,"gpu") then > 12 if job_desc.cpus_per_tres == nil then > 13 job_desc.cpus_per_tres="gpu:20" > 14 slurm.info("SETTING"); > 15 end > 16 slurm.info("SETTING2"); > 17 end > 18 slurm.info("SETTING3"); > 19 end > 20 > 21 function slurm_job_modify(job_desc, job_ptr, part_list, modify_uid) > 22 > 23 return slurm.SUCCESS > 24 end The script will add --cpus-per-gpu=20 to every job submited to the partition with "gpu" in its name if there wasn't different --cpus-per-gpu value specificed by user. The drawback of that may be visible if someone submits a job to multiple partitions the --cpus-per-gpu=20 will effectively be added to every partition. I think that it won't be an issue in your configuration (the only GPU nodes are in the gpu_ partition). Let me know if you have any question, Marcin
PS. I just noticed I left debuging output in the script example. Sorry for that - you can skip the lines starting with "slurm.info".
Karl-Heinz, The fix for reported issue passed the QA and got merged to our public repository[1]. It will be part of Slurm 20.02.7 release. I'm marking the bug as fixed now. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/0b6faf691c6fb5445fdb01c74daf81ecb87e05db
*** Ticket 10103 has been marked as a duplicate of this ticket. ***