| Summary: | Unable to enable MIG Mode for GPU Not Supported, A6000 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Robert Kudyba <rk3199> |
| Component: | GPU | Assignee: | Ben Glines <ben.glines> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 21.08.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Columbia University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Robert Kudyba
2023-02-27 12:50:03 MST
The A6000 does not support MIGs. See nvidia's list of GPUs that support MIGs here: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#supported-gpus Despite this, Slurm still does provide a generic mechanism for sharing GPUs among multiple jobs called "sharding". You can read more about how it works and how to set it up here: https://slurm.schedmd.com/gres.html#Sharding. Please read through that thoroughly if you want to try test it on your own. Let me know if you still have any questions about sharding after having read the documentation and tested it out for yourself if you do decide to try it. Thanks Ben. I've read through the very short description of sharding but can you describe the difference between MPS and sharding? For example, a node has 4 GPUs, and 8 users request a GPU at the same time. How would the work load be distributed? How well does it work with GPU memory sharing as well as issues with TensorFlow being a bit 'greedy' and trying to use all available GPU memory? In the case of a more heterogeneous workload as you mention you have, MPS is actually preferred over sharding. Sharding does nothing to fence processes' resources, while MPS will allocate only a percentage of a GPUs resources to each job, effectively controlling the resources available to each job. This is important in cases when one process may attempt to use the entire GPU, which would starve resources from other jobs or vice versa. I would suggest reading https://slurm.schedmd.com/gres.html#MPS_Management, as well as the several external links included in that documentation to better understand everything. Hi Robert, Did my last reply help clear things up? Do you have any other questions? For MPS, with Tensorflow I noticed in my previous place once TF grabbed a GPU it would gradually start using more GPU memory, as in it's "greedy". Does MPS have a way to deal with this? Support for MPS within Slurm ensures that jobs are scheduled with CUDA_MPS_ACTIVE_THREAD_PERCENTAGE set to whatever you requested, as well as ensuring that jobs are only scheduled when MPS resources are available. As for actual MPS behavior, you'll need to review nvidia's documentation and/or contact their support for more help. There is some interesting information in this section of nvidia's MPS documentation: https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_5_2. It does imply some interesting things about how idle resources are used (specifically when talking about different provisioning strategies), but again, you'll need to contact nvidia's support for more information on the exact behavior of MPS. As a more general note: if you're seeing issues with MPS/sharding, specifically with individual jobs frequently getting bottle-necked by GPU resources (jobs being "greedy"), it might be best for users to instead scale their applications to use entire GPUs rather than just a fraction of them with MPS. Any further questions on this topic? I users were to adjust their jobs to use an entire GPU doesn't that defeat the purpose for MPS? We haven't configured it but it's good to know if case we ever do. Other than that this can be closed. (In reply to Robert Kudyba from comment #8) > I users were to adjust their jobs to use an entire GPU doesn't that defeat > the purpose for MPS? We haven't configured it but it's good to know if case > we ever do. Other than that this can be closed. Yes, exactly. In our experience, we have seen some sites configure MPS/shards, only to have users adjust their jobs to use entire GPUs and not use those features anymore. If users' jobs are using entire GPUs, or can be scaled to do such, that is often preferable over using MPS/sharding. Hopefully this makes more sense now :) Closing this now. |