| Summary: | cgroup device list not properly enforced with MIG | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | GPU | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | bas.vandervlies, felip.moll |
| Version: | 22.05.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Kilian Cavalotti
2023-03-03 15:38:28 MST
Hi Kilian, Could you reply with an updated slurm.conf? Hi Ben, (In reply to Ben Glines from comment #2) > Could you reply with an updated slurm.conf? Can I send you our slurm.conf directly? Attaching it here would require me to make that ticket private, and that would make it unavailable to other sites that may be interested. Or I can attach here the specific parts you're interested in? Thanks! -- KIlian (In reply to Kilian Cavalotti from comment #3) > Hi Ben, > > (In reply to Ben Glines from comment #2) > > Could you reply with an updated slurm.conf? > > Can I send you our slurm.conf directly? Attaching it here would require me > to make that ticket private, and that would make it unavailable to other > sites that may be interested. > > Or I can attach here the specific parts you're interested in? > > Thanks! > -- > KIlian I'm mostly just interested in your node definition for sh03-17n15, specifically your Gres specification. (In reply to Ben Glines from comment #5) > I'm mostly just interested in your node definition for sh03-17n15, > specifically your Gres specification. That I can do, no problem. Here it is: -- 8< ------------------------------------------ # SH3_G8FP64m | MLN | 32c | 256GB | 8x A30 NodeName=sh03-17n[14-15] \ Sockets=2 CoresPerSocket=16 \ RealMemory=256000 \ Gres=gpu:8 \ Weight=164431 \ Feature="IB:HDR,CPU_MNF:AMD,CPU_GEN:MLN,CPU_SKU:7543P,CPU_FRQ:2.75GHz,GPU_GEN:AMP,GPU_BRD:TESLA,GPU_SKU:A30,GPU_MEM:24GB,GPU_CC:8.0,CLASS:SH3_G8FP64m" -- 8< ------------------------------------------ and the partition definition: -- 8< ------------------------------------------ PartitionName=test \ DefMemPerCPU=8000 \ DefCPUPerGpu=1 \ AllowGroups=sh_sysadm \ PriorityTier=10000 \ PriorityJobFactor=10000 \ nodes=sh02-01n[59-60],sh03-01n[71-72],sh03-17n[14-15] -- 8< ------------------------------------------ Thanks! -- Kilian Looks like your issue might be that you only specified 8 gpus here. Each MIG instance is really treated as its own GPU instance as mentioned in our docs: https://slurm.schedmd.com/gres.html#MIG_Management The example there also gives insight into how a node definition would look like with a GPU without any MIGs, and a GPU with 2 MIGs configured. Try changing your Gres specification to the number of MIGs you have and let me know if that fixes things. (In reply to Ben Glines from comment #7) > Looks like your issue might be that you only specified 8 gpus here. > > Each MIG instance is really treated as its own GPU instance as mentioned in > our docs: https://slurm.schedmd.com/gres.html#MIG_Management > > The example there also gives insight into how a node definition would look > like with a GPU without any MIGs, and a GPU with 2 MIGs configured. > > Try changing your Gres specification to the number of MIGs you have and let > me know if that fixes things. Ooh, I totally missed that! Changing the node definition to have `Gres=gpu:32` indeed fixes the problem. I somehow assumed that the AutoDetect=nvml part in gres.conf would have taken care of enumerating all the existing GPUs on the node, and forgot about the Gres option in the node definition. Looking back at the documentation, it actually all makes sense. Except maybe this: I'm just not exactly sure what it means? """ The sanity-check AutoDetect mode is not supported for MIGs. """ Thanks! -- Kilian (In reply to Kilian Cavalotti from comment #8) > Looking back at the documentation, it actually all makes sense. Except maybe > this: I'm just not exactly sure what it means? > """ > The sanity-check AutoDetect mode is not supported for MIGs. > """ From https://slurm.schedmd.com/gres.html#AutoDetect: > By default, all system-detected devices are added to the node. However, if Type and > File in gres.conf match a GPU on the system, any other properties explicitly specified > (e.g. Cores or Links) can be double-checked against it. If the system-detected GPU > differs from its matching GPU configuration, then the GPU is omitted from the node with > an error. This allows gres.conf to serve as an optional sanity check and notifies > administrators of any unexpected changes in GPU properties. You can use gres.conf to make sure that what NVML detects is what you actually expect, e.g. if NVML does not detect a GPU that you configured in gres.conf, then slurmd for that node will fatal(). This sort of double-checking/sanity-checking does not work for MIGs. (In reply to Ben Glines from comment #9) > You can use gres.conf to make sure that what NVML detects is what you > actually expect, e.g. if NVML does not detect a GPU that you configured in > gres.conf, then slurmd for that node will fatal(). This sort of > double-checking/sanity-checking does not work for MIGs. Got it, thanks for the explanation! Cheers, -- Kilian |