Hi, I have a weird issue when submitting jobs with --gres and --ntasks-per-node, but I'm not sur eif that's a configuration issue or something I'm missing. I have GPU nodes, with the following configuration: $ scontrol show partition gpu PartitionName=gpu AllowGroups=ALL AllowAccounts=ALL AllowQos=gpu,system AllocNodes=ALL Default=NO DefaultTime=02:00:00 DisableRootJobs=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=gpu-9-[1-5] Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=REQUEUE State=UP TotalCPUs=80 TotalNodes=5 SelectTypeParameters=N/A DefMemPerCPU=16000 MaxMemPerCPU=16384 $ scontrol show node gpu-9-1 NodeName=gpu-9-1 Arch=x86_64 CoresPerSocket=8 CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=N/A Features=k20x Gres=gpu:8 NodeAddr=gpu-9-1 NodeHostName=gpu-9-1 Version=(null) OS=Linux RealMemory=258000 AllocMem=0 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1000 BootTime=2014-09-02T09:18:29 SlurmdStartTime=2014-09-02T13:50:02 CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Their gres.conf file is as follows: Name=gpu File=/dev/nvidia[0-3] CPUs=[0-7] Name=gpu File=/dev/nvidia[4-7] CPUs=[8-15] So they have 16 CPU-cores and 8 GPUs each. Yet I can't seem to be able to run more that 8 tasks per node when requesting gres. Without gres, I can request up to 16 tasks, which is normal. $ salloc --ntasks-per-node=16 -p gpu --qos=gpu salloc: Granted job allocation 322052 $ exit salloc: Relinquishing job allocation 322052 $ salloc --ntasks-per-node=9 --gres=gpu:1 -p gpu --qos=gpu salloc: error: Job submit/allocate failed: Requested node configuration is not available salloc: Job allocation 322053 has been revoked. $ salloc --ntasks-per-node=8 --gres=gpu:1 -p gpu --qos=gpu salloc: Granted job allocation 322054 $ exit salloc: Relinquishing job allocation 322054 Could you please help me figuring out what I'm doing wrong? Thanks!
Hi, if you request 2 gpus and --ntasks-per-node=9 does it work? I am going to try your case but we think the code is enforcing the gpu allocation in block of 8 cpus as configured. David
Hi David, (In reply to David Bigagli from comment #1) > if you request 2 gpus and --ntasks-per-node=9 does it work? Nope: $ salloc --ntasks-per-node=9 --gres=gpu:2 -p gpu --qos=gpu salloc: error: Job submit/allocate failed: Requested node configuration is not available
Hi Kilian, I can reproduce this problem as well. Someone from our scheduling team will get back to you on this. David
(In reply to David Bigagli from comment #3) > Hi Kilian, > I can reproduce this problem as well. Someone from our scheduling > team will get back to you on this. Thanks!
I can give you a work-around for right now. If you specify the option --gres=gpu:5 then it the allocation logic seems to spill over into the second batch of CPUs so that all CPUs are accessible.
Hi Moe, > I can give you a work-around for right now. If you specify the option > --gres=gpu:5 then it the allocation logic seems to spill over into the > second batch of CPUs so that all CPUs are accessible. Right, that works, thanks. I was using --exclusive before, but just realized that it still sets SLURM_TASKS_PER_NODE to 8.
This will be fixed in version 14.03.8 when released. The commit (in case you want to work with a patch) is here: https://github.com/SchedMD/slurm/commit/0ec4d6b76b568ce7703c1c42c2cb51d1bddde7f8 Since you want more than 8 CPUs (which are not associated with any single GPU), you will have specify at least two GPUs in the job request, i.e. "--gres=gpu:2" (which is better than the 5 you need today). Here is a sample of what I see for CUDA_VISIBLE_DEVICES with your configuration: $ srun -n16 --gres=gpu:2 tmp2 CUDA_VISIBLE_DEVICES=0,4 ... CUDA_VISIBLE_DEVICES=0,4 Note that until version 14.11, you must spell out each file name, so while this will work in v14.11: Name=gpu File=/dev/nvidia[0-3] CPUs=[0-7] Name=gpu File=/dev/nvidia[4-7] CPUs=[8-15] For now you should change gres.conf to this: Name=gpu File=/dev/nvidia0 CPUs=[0-7] Name=gpu File=/dev/nvidia1 CPUs=[0-7] Name=gpu File=/dev/nvidia2 CPUs=[0-7] Name=gpu File=/dev/nvidia3 CPUs=[0-7] Name=gpu File=/dev/nvidia4 CPUs=[8-15]
Created attachment 1221 [details] make gres.conf CPUs advisory If you want the GRES CPU specifications to be advisory so that you can allocate one GPU with over 8 CPUs, this patch for your local use on top of the commit already made will do that. As indicated previously, you can specify 2 GPUs with the code that will be in the general release.
Fixed in v14.03.8
Hi Moe, (In reply to Moe Jette from comment #7) > This will be fixed in version 14.03.8 when released. The commit (in case you > want to work with a patch) is here: > https://github.com/SchedMD/slurm/commit/ > 0ec4d6b76b568ce7703c1c42c2cb51d1bddde7f8 Thanks for the fix! > Since you want more than 8 CPUs (which are not associated with any single > GPU), you will have specify at least two GPUs in the job request, i.e. > "--gres=gpu:2" (which is better than the 5 you need today). Ok. Could you please give me more details about the need to request 2 GPUs to be able to allocate all the 16 CPUs? I've seen your next patch, but I'm not sure I understand the logic behind the default behavior (needing 2 GPUs for more than 8 CPUs). Wouldn't it make some sense to allow requests of independent numbers of CPUs and GPUs, and just use the device/CPU mapping as a 'hint' rather than a strong requirement? Like it's done for the topology plugin, to optimize job placement, but in case an optimal setting is not achievable, run the job anyway. Maybe I'm not making sense, I have a very limited understanding of the things at stake here. > Note that until version 14.11, you must spell out each file name, so while > this will work in v14.11: > Name=gpu File=/dev/nvidia[0-3] CPUs=[0-7] > Name=gpu File=/dev/nvidia[4-7] CPUs=[8-15] > > For now you should change gres.conf to this: > Name=gpu File=/dev/nvidia0 CPUs=[0-7] > Name=gpu File=/dev/nvidia1 CPUs=[0-7] > Name=gpu File=/dev/nvidia2 CPUs=[0-7] > Name=gpu File=/dev/nvidia3 CPUs=[0-7] > Name=gpu File=/dev/nvidia4 CPUs=[8-15] Oh ok, I thought this was fixed in 14.03.5 as per #905. Because we've been running that way for a while, and it seemed fine. With DebugFlags=gres, I have the following in slurmd.log: [2014-09-10T12:22:13.014] Gres Name=gpu Count=4 ID=7696487 File=/dev/nvidia[0-3] CPUs=[0-7] CpuCnt=16 [2014-09-10T12:22:13.014] Gres Name=gpu Count=4 ID=7696487 File=/dev/nvidia[4-7] CPUs=[8-15] CpuCnt=16 The "Count=4" part makes me feel it got it right, am I wrong? Thanks!
(In reply to Kilian Cavalotti from comment #10) > > Since you want more than 8 CPUs (which are not associated with any single > > GPU), you will have specify at least two GPUs in the job request, i.e. > > "--gres=gpu:2" (which is better than the 5 you need today). > > Ok. Could you please give me more details about the need to request 2 GPUs > to be able to allocate all the 16 CPUs? > I've seen your next patch, but I'm not sure I understand the logic behind > the default behavior (needing 2 GPUs for more than 8 CPUs). Wouldn't it make > some sense to allow requests of independent numbers of CPUs and GPUs, and > just use the device/CPU mapping as a 'hint' rather than a strong > requirement? Like it's done for the topology plugin, to optimize job > placement, but in case an optimal setting is not achievable, run the job > anyway. Right now the CPUs specification in gres.conf is not advisory. A line like this: Name=gpu File=/dev/nvidia0 CPUs=[0-7] means that GPU 0 can only be used with CPUs 0-7. The patch that I attached make it advisory rather than mandatory. I definitely don't want to change this in version 14.03 and I have mixed feelings about changing it in v14.11. > > Note that until version 14.11, you must spell out each file name, so while > > this will work in v14.11: > > Name=gpu File=/dev/nvidia[0-3] CPUs=[0-7] > > Name=gpu File=/dev/nvidia[4-7] CPUs=[8-15] > > > > For now you should change gres.conf to this: > > Name=gpu File=/dev/nvidia0 CPUs=[0-7] > > Name=gpu File=/dev/nvidia1 CPUs=[0-7] > > Name=gpu File=/dev/nvidia2 CPUs=[0-7] > > Name=gpu File=/dev/nvidia3 CPUs=[0-7] > > Name=gpu File=/dev/nvidia4 CPUs=[8-15] > > Oh ok, I thought this was fixed in 14.03.5 as per #905. > Because we've been running that way for a while, and it seemed fine. With > DebugFlags=gres, I have the following in slurmd.log: > > [2014-09-10T12:22:13.014] Gres Name=gpu Count=4 ID=7696487 > File=/dev/nvidia[0-3] CPUs=[0-7] CpuCnt=16 > [2014-09-10T12:22:13.014] Gres Name=gpu Count=4 ID=7696487 > File=/dev/nvidia[4-7] CPUs=[8-15] CpuCnt=16 > > The "Count=4" part makes me feel it got it right, am I wrong? I see some problems in the logic managing the mapping of GPUs to CPUs without the files on separate lines. Again, that will be fixed in v14.11. For now, I would recommend splitting it into separate lines.
(In reply to Moe Jette from comment #11) > Right now the CPUs specification in gres.conf is not advisory. A line like > this: > Name=gpu File=/dev/nvidia0 CPUs=[0-7] > means that GPU 0 can only be used with CPUs 0-7. The patch that I attached > make it advisory rather than mandatory. I definitely don't want to change > this in version 14.03 and I have mixed feelings about changing it in v14.11. Ok, I see, it makes sense. > I see some problems in the logic managing the mapping of GPUs to CPUs > without the files on separate lines. Again, that will be fixed in v14.11. > For now, I would recommend splitting it into separate lines. Thanks for the clarification, I'll go split that file up. Thanks again for the explanation!
(In reply to Kilian Cavalotti from comment #12) > (In reply to Moe Jette from comment #11) > > Right now the CPUs specification in gres.conf is not advisory. A line like > > this: > > Name=gpu File=/dev/nvidia0 CPUs=[0-7] > > means that GPU 0 can only be used with CPUs 0-7. The patch that I attached > > make it advisory rather than mandatory. I definitely don't want to change > > this in version 14.03 and I have mixed feelings about changing it in v14.11. > > Ok, I see, it makes sense. Oh and I forgot to ask: would it be worth mentioning this in the gres.conf documentation? Thanks!
(In reply to Kilian Cavalotti from comment #13) > (In reply to Kilian Cavalotti from comment #12) > > (In reply to Moe Jette from comment #11) > > > Right now the CPUs specification in gres.conf is not advisory. A line like > > > this: > > > Name=gpu File=/dev/nvidia0 CPUs=[0-7] > > > means that GPU 0 can only be used with CPUs 0-7. The patch that I attached > > > make it advisory rather than mandatory. I definitely don't want to change > > > this in version 14.03 and I have mixed feelings about changing it in v14.11. > > > > Ok, I see, it makes sense. > > Oh and I forgot to ask: would it be worth mentioning this in the gres.conf > documentation? > > Thanks! I just added info to the gres web page and gres.conf man page.
> I just added info to the gres web page and gres.conf man page. Excellent, thank you Moe!