16137 – Unable to enable MIG Mode for GPU Not Supported, A6000

Ticket 16137 - Unable to enable MIG Mode for GPU Not Supported, A6000

Summary: Unable to enable MIG Mode for GPU Not Supported, A6000

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	GPU (show other tickets)
Version:	21.08.8
Hardware:	Linux Linux

Severity:	3 - Medium Impact
Assignee:	Ben Glines
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2023-02-27 12:50 MST by Robert Kudyba
Modified:	2023-03-24 13:40 MDT (History)
CC List:	0 users

See Also:
Site:	Columbia University
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Robert Kudyba 2023-02-27 12:50:03 MST

Hello,

We are running slurm21.08-21.08.8 on RHEL 9 with Bright Computing. We have 8 A6000 on 13 nodes and are trying to enable MIG. See below error and our slurm and goes.conf:

[root@m001 ~]# nvidia-smi -i 0 -mig 1
Unable to enable MIG Mode for GPU 00000000:01:00.0: Not Supported

NodeName=m[001-013] Procs=192 CoresPerSocket=48 RealMemory=953674 Sockets=2 ThreadsPerCore=2 Gres=gpu:A6000:8 Feature=location=local
# Partitions
PartitionName=defq Default=YES MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscrib>
# Scheduler
SchedulerType=sched/backfill
# Generic resources types
GresTypes=gpu
# Epilog/Prolog section
Prolog=/cm/local/apps/cmd/scripts/prolog
Epilog=/cm/local/apps/cmd/scripts/epilog
# Power saving section (disabled)
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
# GPU related plugins
AccountingStorageTRES=gres/gpu

[root@m001 ~]# nvidia-smi
Mon Feb 27 14:45:51 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:01:00.0 Off |                  Off |
| 30%   27C    P8    10W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000    On   | 00000000:25:00.0 Off |                  Off |
| 30%   26C    P8     8W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000    On   | 00000000:41:00.0 Off |                  Off |
| 30%   26C    P8     6W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    On   | 00000000:61:00.0 Off |                  Off |
| 30%   26C    P8     7W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA RTX A6000    On   | 00000000:81:00.0 Off |                  Off |
| 30%   25C    P8    15W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA RTX A6000    On   | 00000000:A1:00.0 Off |                  Off |
| 30%   25C    P8     8W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA RTX A6000    On   | 00000000:C1:00.0 Off |                  Off |
| 30%   26C    P8    10W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA RTX A6000    On   | 00000000:E1:00.0 Off |                  Off |
| 30%   26C    P8     8W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+


AutoDetect=nvml
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia0
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia1
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia2
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia3
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia4
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia5
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia6
NodeName=m[001-013] Name=gpu Type=A6000 Count=1 File=/dev/nvidia7

Comment 1 Ben Glines 2023-02-27 15:48:14 MST

The A6000 does not support MIGs. See nvidia's list of GPUs that support MIGs here: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-gpus

Despite this, Slurm still does provide a generic mechanism for sharing GPUs among multiple jobs called "sharding". You can read more about how it works and how to set it up here: https://slurm.schedmd.com/gres.html#Sharding. Please read through that thoroughly if you want to try test it on your own.

Let me know if you still have any questions about sharding after having read the documentation and tested it out for yourself if you do decide to try it.

Comment 2 Robert Kudyba 2023-02-27 18:07:58 MST

Thanks Ben. 

I've read through the very short description of sharding but can you describe the difference between MPS and sharding? For example, a node has 4 GPUs, and 8 users request a GPU at the same time. How would the work load be distributed? 

How well does it work with GPU memory sharing as well as issues with TensorFlow being a bit 'greedy' and trying to use all available GPU memory?

Comment 3 Ben Glines 2023-02-28 11:31:23 MST

In the case of a more heterogeneous workload as you mention you have, MPS is actually preferred over sharding. Sharding does nothing to fence processes' resources, while MPS will allocate only a percentage of a GPUs resources to each job, effectively controlling the resources available to each job. This is important in cases when one process may attempt to use the entire GPU, which would starve resources from other jobs or vice versa.

I would suggest reading https://slurm.schedmd.com/gres.html#MPS_Management, as well as the several external links included in that documentation to better understand everything.

Comment 4 Ben Glines 2023-03-09 09:12:39 MST

Hi Robert,

Did my last reply help clear things up? Do you have any other questions?

Comment 5 Robert Kudyba 2023-03-10 13:24:00 MST

For MPS, with Tensorflow I noticed in my previous place once TF grabbed a GPU it would gradually start using more GPU memory, as in it's "greedy". Does MPS have a way to deal with this?

Comment 6 Ben Glines 2023-03-13 09:24:05 MDT

Support for MPS within Slurm ensures that jobs are scheduled with CUDA_MPS_ACTIVE_THREAD_PERCENTAGE set to whatever you requested, as well as ensuring that jobs are only scheduled when MPS resources are available. As for actual MPS behavior, you'll need to review nvidia's documentation and/or contact their support for more help.

There is some interesting information in this section of nvidia's MPS documentation: https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_5_2. It does imply some interesting things about how idle resources are used (specifically when talking about different provisioning strategies), but again, you'll need to contact nvidia's support for more information on the exact behavior of MPS.

As a more general note: if you're seeing issues with MPS/sharding, specifically with individual jobs frequently getting bottle-necked by GPU resources (jobs being "greedy"), it might be best for users to instead scale their applications to use entire GPUs rather than just a fraction of them with MPS.

Comment 7 Ben Glines 2023-03-24 10:24:37 MDT

Any further questions on this topic?

Comment 8 Robert Kudyba 2023-03-24 13:31:39 MDT

I users were to adjust their jobs to use an entire GPU doesn't that defeat the purpose for MPS? We haven't configured it but it's good to know if case we ever do. Other than that this can be closed.

Comment 9 Ben Glines 2023-03-24 13:40:27 MDT

(In reply to Robert Kudyba from comment #8)
> I users were to adjust their jobs to use an entire GPU doesn't that defeat
> the purpose for MPS? We haven't configured it but it's good to know if case
> we ever do. Other than that this can be closed.

Yes, exactly. In our experience, we have seen some sites configure MPS/shards, only to have users adjust their jobs to use entire GPUs and not use those features anymore. If users' jobs are using entire GPUs, or can be scaled to do such, that is often preferable over using MPS/sharding. Hopefully this makes more sense now :)

Closing this now.