Summary: | GRES GPU details | ||
---|---|---|---|
Product: | Slurm | Reporter: | Torkil Svensgaard <torkil> |
Component: | Documentation | Assignee: | Marcin Stolarek <cinek> |
Status: | RESOLVED INFOGIVEN | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | bas.vandervlies, cinek, rkv |
Version: | 20.11.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
See Also: | https://bugs.schedmd.com/show_bug.cgi?id=9567 | ||
Site: | DRCMR | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- |
Description
Torkil Svensgaard
2021-08-11 01:57:02 MDT
Torkil, >How do I as a user see which GPUs are available and what their capabilities are, like RAM? From our experience, this is most often part of site documentation together with partitions' names and resources available there, so the user can check specific partition/nodes utilization using slurm commands, but to learn more about GPU (or CPU/platform ) architecture external (think about CPU topology details, or RAM frequency etc.). >And how do I target GPUs with more than 6GB RAM as an example? If you don't mix different GPU types on the same node, one of the approaches is to use NodeFeatures as the indicator of per GPU available memory on the node, then end-user can specify those like: sbatch --gres=gpu:1 -C '[GPU_RAM_6GB|GPU_RAM_12GB]' Let me know if that helps. cheers, Marcin Hi Marcin Thanks. Too bad this information isn't available through sinfo but alas. Feel free to close the ticket. Mvh. Torkil Torkil, sinfo -N --format="%N %30f" Regards Bas (In reply to Bas van der Vlies from comment #3) > Torkil, > > sinfo -N --format="%N %30f" torkil@averell:/tmp$ sinfo -N --format="%N %30f" NODELIST AVAIL_FEATURES big20 (null) big21 (null) big22 (null) big27 (null) big28 (null) bigger2 (null) bigger3 (null) bigger4 (null) bigger6 (null) bigger7 (null) bigger9 (null) bigger10 (null) bigger11 (null) bigger12 (null) bigger13 (null) chimera (null) drakkisath (null) fenrir (null) gojira (null) ix1 (null) kong (null) rivendare (null) small1 (null) small2 (null) small19 (null) small24 (null) small25 (null) small29 (null) small30 (null) small31 (null) small32 (null) small33 (null) small34 (null) small35 (null) smaug (null) I guess that would list NodeFeatures, if I had any of those? You have this: " sinfo -N --format="%N %G" NODELIST GRES big20 (null) bigger2 gpu:1(S:0) chimera gpu:2(S:0) " Had hoped for something like this: NODELIST GRES big20 (null) bigger2 (gpu1 RTX2060 6GB) chimera (gpu1 RTX2060 6GB) (gpu2 RTX3090 24GB) That would be cool *wink* wink* Mvh. Torkil >Feel free to close the ticket.
OK. I'll check with the team if there are any plans/other requests to make some of those details available over Slurm cli.
(In reply to Marcin Stolarek from comment #6) > >Feel free to close the ticket. > OK. I'll check with the team if there are any plans/other requests to make > some of those details available over Slurm cli. Thanks. I could write it in my site documentation sure, but it would be more user friendly and less work for the poor overworked sysadmins if the users could just go: sinfo -N --format="%N %G" | grep gpu bigger2 gpu:1(S:0) (RTX2060/6GB) chimera gpu:2(S:0) (RTX2060/6GB, RTX3090/24GB) And easily decide where to put jobs with speciel requirements, like GPU memory. Mvh. Torkil In fact there is a new development for Slurm 21.08 in Bug 9567 introducing node_features/helpers plugin that may be used to set features more automatically. The bug is public, so you can check the details there. I'm closing this bug report now, if you have any question please reopen. cheers, Marcin |