Certain cards, potentially such as the A100 running with MIG, will present multiple device files that should be managed as a single GRES. For the cgroup device enforcement to work, a new gres.conf option will be needed to support this. Note that this is different from the current GRES syntax that supports a range of device entries, such as: Name=gpu Type=k20 File=/dev/nvidia[0-3] which establishes three separate gpu/k20 GRES on the node. The new syntax instead will look like: Name=gpu Type=newcard MultipleFiles=/dev/nvidia0,/dev/other-device-entry which will establish a single gpu/newcard GRES managing the pair of device files.
The three core commits to enable this follow. This will be in 20.11 when released: ff4bf3e085e0f8638e1e9cba7e1437665f2cd8c9 Author: Tim Wickberg <tim@schedmd.com> AuthorDate: Thu Oct 8 23:34:16 2020 -0600 gres.conf - add new MultipleFiles configuration option. Bug 9964. commit d3d7b7d516d702f74731ea7f5143a0676a7036de Author: Tim Wickberg <tim@schedmd.com> AuthorDate: Thu Oct 8 23:24:26 2020 -0600 Allow get_devices() to return more gres_device_t entries than we have GRES. So that we can support multiple device files mapped into a single GRES entrie. commit 5fbb2ca90aaec157defd85f485569b94e9c8f61c Author: Tim Wickberg <tim@schedmd.com> AuthorDate: Thu Oct 8 23:21:57 2020 -0600 Add an index value to gres_device_t. Needed to add support for managing access to multiple device files underneath a single GRES. In such a case the index value will let us map the GRES allocated bitmap back to the gres_device_t entries.