Related to bug 2475 From Kilian at Stanford If we have a QOS defined like this ... # sacctmgr list qos gpu format=name,mintres%24 Name MinTRES ---------- ------------------------ gpu cpu=1,gres/gpu=1 And we use GRES "types" for GPUs, in gres.conf: NodeName=gpu-13-[1-2] Name=gpu Type=gtx File=/dev/nvidia[0-3] CPUs=0-7 [...] NodeName=gpu-9-[1-2] Name=gpu Type=tesla File=/dev/nvidia[0-3] CPUs=0-7 and in nodes definitions: # scontrol show node gpu-13-1 | grep -i gres Gres=gpu:gtx:8 # scontrol show node gpu-9-1 | grep -i gres Gres=gpu:tesla:8 Turns out that the QOS MinTRES limit rejects jobs that are requesting a specific GPU type, ie: * "srun --gres gpu:1" works * "srun --gres gpu:gtx:1" doesn't work, and results in "srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)"
More pulled from 2475 Since TRES are fine grained (meaning the gtx and tesla would result in 2 different TRES) I am not sure it will work like you expect when specifying the type. I can see what you are after though. What I believe is happening right now is the gres/gpu is what you are tracking, but when the type gets thrown into the mix Slurm is treating it as a completely different TRES. This fine grained implementation probably won't work for you. I'm not sure how to change it and still support the situation where people do what the fine grained TRES, perhaps we could track both gres/gpu as well as gres/gpu:gtx|tesla, and when it comes in we match both gres/gpu as well as the type. That would give us both.
This has been added in commit 0cd692967b. Let me know how it goes.