The OverSubscribe=EXCLUSIVE documentation (https://slurm.schedmd.com/slurm.conf.html#OPT_EXCLUSIVE) states the following: > EXCLUSIVE > Allocates entire nodes to jobs even with SelectType=select/cons_res or SelectType=select/cons_tres configured. > Jobs that run in partitions with OverSubscribe=EXCLUSIVE will have exclusive access to all allocated nodes. > These jobs are allocated all CPUs and GRES on the nodes, but they are only allocated as much memory as they ask for. We have a use case where jobs allocate nodes as exclusive, but we don't want to allocate all the GRES of one custom type. The reason is that enabling this GRES requires extra work and a limited number of nodes can use this GRES concurrently on the cluster Therefore we don't want to "bill" a user for a GRES that is optionally available on the nodes but is not in use for the current job, and we want to enforce resource limits such as GrpTRES based on the actual usage. There is the no_consume flag today, but it will allocate 0 GRES, so it doesn't work for accounting limits. For this use case I believe we would need a new flag (could be called "consume_required") and would be used this way: NodeName=ioctl Gres=widget:consume_required:10 PartitionName=debug Nodes=ioctl OverSubscribe=EXCLUSIVE $ srun --gres=widget:4 -p debug # Will consume 4/10 widget GRES $ srun -p debug # Will consume 0/10 widget GRES This could perhaps be achieved with licenses, but this is inherently a per-node resource unlike licenses that are per-cluster. Node A might support 10 widgets but node B might support 20 widgets.
"consume_requested" might be a better name, I was thinking of `ReqTres` but Req stands for "Requested" and not "Required".
Felix, this would be possible, however would require being sponsored as paid development by Nvidia. Is this something you are interested in sponsoring?
Updating ticket metadata to reflect status as a potential future enhancement.
Hey Felix - We're working on wrapping this up, but stumbled on one subtle implementation detail that we wanted to check with you on. Each Gres can have an (optional) Type field. Common device definitions look like: Name=gpu Type=k20 File=/dev/nvidia0 Internally, the internal flags field - where this new "explicit" flag is being added - is mapped to the Gres - not each individual (Gres,Type) tuple. This means that, in our current implementation, if you specify any Explicit Gres like: Name=gpu Type=k20 File=/dev/nvidia0 Flags=Explicit That the explicit flag applies not only to the k20 types, but all Gres=gpu defined on the node. So any further definitions like: Name=gpu Type=h100 File=/dev/nvidia1 would automatically inherit the "explicit" flag, and be treated as such in the configuration. We're hoping that's not an issue for your expected use case here, but wanted to confirm that with you in case you have some use for this flag that doesn't match up to this. - Tim
Thanks for asking, I think that's fine.
Felix, I'm happy to let you know that requested feature got merged into our public repository[1] and will be part of Slurm 23.02 release. I'll go ahead and mark the ticket as fixed. Should you have any questions please don't hesitate to reopen. cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/75be81090106b9b083698e66e8821f0113af72b1