Hi We have some different types of GPUs on different compute nodes and are using Autodetect=nvml. How do I as a user see which GPUs are available and what their capabilities are, like RAM? And how do I target GPUs with more than 6GB RAM as an example? Thanks, Torkil
Torkil, >How do I as a user see which GPUs are available and what their capabilities are, like RAM? From our experience, this is most often part of site documentation together with partitions' names and resources available there, so the user can check specific partition/nodes utilization using slurm commands, but to learn more about GPU (or CPU/platform ) architecture external (think about CPU topology details, or RAM frequency etc.). >And how do I target GPUs with more than 6GB RAM as an example? If you don't mix different GPU types on the same node, one of the approaches is to use NodeFeatures as the indicator of per GPU available memory on the node, then end-user can specify those like: sbatch --gres=gpu:1 -C '[GPU_RAM_6GB|GPU_RAM_12GB]' Let me know if that helps. cheers, Marcin
Hi Marcin Thanks. Too bad this information isn't available through sinfo but alas. Feel free to close the ticket. Mvh. Torkil
Torkil, sinfo -N --format="%N %30f" Regards Bas
(In reply to Bas van der Vlies from comment #3) > Torkil, > > sinfo -N --format="%N %30f" torkil@averell:/tmp$ sinfo -N --format="%N %30f" NODELIST AVAIL_FEATURES big20 (null) big21 (null) big22 (null) big27 (null) big28 (null) bigger2 (null) bigger3 (null) bigger4 (null) bigger6 (null) bigger7 (null) bigger9 (null) bigger10 (null) bigger11 (null) bigger12 (null) bigger13 (null) chimera (null) drakkisath (null) fenrir (null) gojira (null) ix1 (null) kong (null) rivendare (null) small1 (null) small2 (null) small19 (null) small24 (null) small25 (null) small29 (null) small30 (null) small31 (null) small32 (null) small33 (null) small34 (null) small35 (null) smaug (null) I guess that would list NodeFeatures, if I had any of those?
You have this: " sinfo -N --format="%N %G" NODELIST GRES big20 (null) bigger2 gpu:1(S:0) chimera gpu:2(S:0) " Had hoped for something like this: NODELIST GRES big20 (null) bigger2 (gpu1 RTX2060 6GB) chimera (gpu1 RTX2060 6GB) (gpu2 RTX3090 24GB) That would be cool *wink* wink* Mvh. Torkil
>Feel free to close the ticket. OK. I'll check with the team if there are any plans/other requests to make some of those details available over Slurm cli.
(In reply to Marcin Stolarek from comment #6) > >Feel free to close the ticket. > OK. I'll check with the team if there are any plans/other requests to make > some of those details available over Slurm cli. Thanks. I could write it in my site documentation sure, but it would be more user friendly and less work for the poor overworked sysadmins if the users could just go: sinfo -N --format="%N %G" | grep gpu bigger2 gpu:1(S:0) (RTX2060/6GB) chimera gpu:2(S:0) (RTX2060/6GB, RTX3090/24GB) And easily decide where to put jobs with speciel requirements, like GPU memory. Mvh. Torkil
In fact there is a new development for Slurm 21.08 in Bug 9567 introducing node_features/helpers plugin that may be used to set features more automatically. The bug is public, so you can check the details there. I'm closing this bug report now, if you have any question please reopen. cheers, Marcin