Ticket 12683

Summary:	bug with requesting gpu memory
Product:	Slurm	Reporter:	Xing Huang <x.huang>
Component:	GPU	Assignee:	Marshall Garey <marshall>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	3 - Medium Impact
Priority:	---
Version:	21.08.2
Hardware:	Linux
OS:	Linux
Site:	WA St. Louis	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Xing Huang 2021-10-18 10:48:06 MDT

Dear to whom it may concern,

When using GPU memory using --mem-per-gpu, we find out that this parameter is not properly assigning the right GPU to a requested job.

[xinghuang@login01 ~]$ srun -N 1 --gres=gpu:1 --mem-per-gpu=18000M --time=00:30:00 --pty bash
[xinghuang@gpu08 ~]$

The above example is a failed case, where I requested 18GB GPU memory, it is assigning me to gpu08, which is a gpu with maximum of 16GB GPU memory.
How do we get this fixed?
Thanks for your time and help!

Best,
Xing

Comment 1 Marshall Garey 2021-10-18 10:54:19 MDT

This is a duplicate of bug 9229.

We fixed --mem-per-gpu in 21.08. However, in Slurm versions before 21.08, --mem-per-gpu is just broken and you should not be using it. Unfortunately, the fixes involved quite a few commits and a few different issues, so they can't be easily backported to 20.02. My recommendation is to just not use --mem-per-gpu until you upgrade to 21.08.


By the way, 20.02 is not supported anymore, so I recommend you make a plan to upgrade to 20.11 or 21.08.

Is there anything else I can help with?

Comment 2 Xing Huang 2021-10-18 11:08:21 MDT

Marshall,

Thanks for your reply!
Is there a guidance for proper update?
Do I need to drain all compute nodes before upgrade?

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, October 18, 2021 11:54 AM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory


* External Email - Caution *

Marshall Garey<mailto:marshall@schedmd.com> changed bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683>
What    Removed Added
Assignee        support@schedmd.com     marshall@schedmd.com

Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=12683#c1> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

This is a duplicate of bug 9229<show_bug.cgi?id=9229>.

We fixed --mem-per-gpu in 21.08. However, in Slurm versions before 21.08,
--mem-per-gpu is just broken and you should not be using it. Unfortunately, the
fixes involved quite a few commits and a few different issues, so they can't be
easily backported to 20.02. My recommendation is to just not use --mem-per-gpu
until you upgrade to 21.08.


By the way, 20.02 is not supported anymore, so I recommend you make a plan to
upgrade to 20.11 or 21.08.

Is there anything else I can help with?

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 3 Marshall Garey 2021-10-18 11:20:03 MDT

Here is our online guide to updating Slurm:

https://slurm.schedmd.com/quickstart_admin.html#upgrade

As you will find here, you do not need to drain the nodes when upgrading Slurm, but can do a "live" upgrade (with jobs running).


We also often have notes about upgrading in our "Field Notes" slides from our SLUG conferences. Those can be found on our "publications" page:

https://slurm.schedmd.com/publications.html

And here's a link to our latest "Field Notes" presentation (Upgrading starts on slide 21):

https://slurm.schedmd.com/SLUG21/Field_Notes_5.pdf

There are multiple ways to do an upgrade - some sites like to do "live" upgrades to make it appear like the cluster never goes down. Other sites like to schedule a maintenance period. Some sites prefer building Slurm from source while others prefer to use RPM's. Some sites will upgrade all daemons and Slurm user commands at once while others will incrementally upgrade each one separately.

If you have any specific questions about upgrading, I suggest opening a new ticket. If you have never upgraded Slurm before, then I suggest opening a ticket with us with your proposed upgrade plan.

Comment 4 Xing Huang 2021-10-18 11:24:51 MDT

Marshall,

Thanks again for your reply!
You can close the ticket now.

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, October 18, 2021 12:20 PM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory


* External Email - Caution *

Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=12683#c3> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

Here is our online guide to updating Slurm:

https://slurm.schedmd.com/quickstart_admin.html#upgrade

As you will find here, you do not need to drain the nodes when upgrading Slurm,
but can do a "live" upgrade (with jobs running).


We also often have notes about upgrading in our "Field Notes" slides from our
SLUG conferences. Those can be found on our "publications" page:

https://slurm.schedmd.com/publications.html

And here's a link to our latest "Field Notes" presentation (Upgrading starts on
slide 21):

https://slurm.schedmd.com/SLUG21/Field_Notes_5.pdf

There are multiple ways to do an upgrade - some sites like to do "live"
upgrades to make it appear like the cluster never goes down. Other sites like
to schedule a maintenance period. Some sites prefer building Slurm from source
while others prefer to use RPM's. Some sites will upgrade all daemons and Slurm
user commands at once while others will incrementally upgrade each one
separately.

If you have any specific questions about upgrading, I suggest opening a new
ticket. If you have never upgraded Slurm before, then I suggest opening a
ticket with us with your proposed upgrade plan.

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 5 Marshall Garey 2021-10-18 11:25:50 MDT

You're welcome! I'm closing this as a duplicate of bug 9229.

*** This ticket has been marked as a duplicate of ticket 9229 ***

Comment 6 Xing Huang 2021-10-27 09:50:34 MDT

Hi Marshall,

I have upgraded slurm from 20.02 to 21.08. However, we still have problems using mem-per-gpu.

[xinghuang@login01 ~]$ srun -N 1 --gres=gpu:1 --mem-per-gpu=38000M --time=00:30:00 --pty bash

[xinghuang@gpu04 ~]$ nvidia-smi
Wed Oct 27 09:33:27 2021      
...
|   0  Tesla V100S-PCI...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   29C    P0    24W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |

This should have dropped me on an A100 with 40GB of VRAM, instead it dropped me to V100S with 32GB of VRAM.

What is the proper way to use mem-per-gpu? Do I need to some special configurations?

Best,
Xing

Comment 7 Xing Huang 2021-10-27 11:41:35 MDT

Hi Marshall,

This is reopening of ticket 12683. I have upgraded slurm from 20.02 to 21.08. However, we still have problems using mem-per-gpu.

[xinghuang@login01 ~]$ srun -N 1 --gres=gpu:1 --mem-per-gpu=38000M --time=00:30:00 --pty bash

[xinghuang@gpu04 ~]$ nvidia-smi
Wed Oct 27 09:33:27 2021      
...
|   0  Tesla V100S-PCI...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   29C    P0    24W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |

This should have dropped me on an A100 with 40GB of VRAM, instead it dropped me to V100S with 32GB of VRAM.

What is the proper way to use mem-per-gpu? Do I need to some special configurations?

Best,
Xing

Comment 8 Marshall Garey 2021-10-27 14:21:45 MDT

--mem-per-gpu requests *node* memory, not gpu memory.

For example:

sbatch --mem-per-gpu=1000 --gpus=4 job.sh

This job will be allocated 4 GPUs and 4000 MB of memory.



Slurm does not know about GPU memory. If you want a specific type of GPU, then you can define GPU types in gres.conf, then you can request that type.

On a node with 2 a100 GPUs, in gres.conf, you might have this:

Name=gpu Type=a100 File=/path/to/gpu/file[0-1]

Then, request two a100 GPUs:

sbatch --gres=gpu:a100:2 job.sh

This requests one a100 GPU.

Comment 9 Xing Huang 2021-10-27 15:03:08 MDT

So is there a way to request gpu vram memory via slurm? What we are trying to achieve is to ask slurm give the job a gpu node with vram of 38g and hopefully slurm would smartly enough to figure out gpu node with a100 will be assigned to the job. Is this not possible in slurm?

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, October 27, 2021 3:21 PM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory

* External Email - Caution *

Comment # 8<https://bugs.schedmd.com/show_bug.cgi?id=12683#c8> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

--mem-per-gpu requests *node* memory, not gpu memory.

For example:

sbatch --mem-per-gpu=1000 --gpus=4 job.sh

This job will be allocated 4 GPUs and 4000 MB of memory.

Slurm does not know about GPU memory. If you want a specific type of GPU, then
you can define GPU types in gres.conf, then you can request that type.

On a node with 2 a100 GPUs, in gres.conf, you might have this:

Name=gpu Type=a100 File=/path/to/gpu/file[0-1]

Then, request two a100 GPUs:

sbatch --gres=gpu:a100:2 job.sh

This requests one a100 GPU.

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 10 Marshall Garey 2021-10-27 15:49:46 MDT

(In reply to Xing Huang from comment #9)
> So is there a way to request gpu vram memory via slurm? What we are trying
> to achieve is to ask slurm give the job a gpu node with vram of 38g and
> hopefully slurm would smartly enough to figure out gpu node with a100 will
> be assigned to the job. Is this not possible in slurm?

It's not possible to do it that way because Slurm doesn't know about GPU memory.

However, there are workarounds that can get you close. Here are a few options:

(1) Request the GPU type.

sbatch --gres=gpu:a100:<number_of_gpus> job.sh

I showed this in my last comment - if a user wants a specific type of GPU, they should really just request that type.


(2) Use node features (https://slurm.schedmd.com/slurm.conf.html#OPT_Features and https://slurm.schedmd.com/sbatch.html#OPT_constraint).

Maybe you have different types of GPUs that have the required GPU memory that could satisfy a job request, and the user doesn't want to specify a specific type of GPU. You could define node features describing what types of GPUs are available on different sets of nodes.


NodeName=n[1-5] Features=gpu_32gb
NodeName=n[6-10] Features=gpu_64gb
NodeName=n[11-15] Features=gpu_40gb

Then you could request a node with at least 40 GB of GPU memory:

sbatch --constraint=gpu_40gb|gpu_64gb --gpus=2 job.sh


You may want to define your node features differently, but I hope this example gives you an idea of how you can use node features.


(3) Make gpu memory another gres.

You can define anything as a GRES in gres.conf. You could define "gpumemory" as a gres, and then a job could request --gres=gpumemory:40G.

Example:

# gres.conf on nodes n[1-5]
Name=gpu Type=a100 File=/path/to/file
Name=gpumemory Count=42949672960

# gres.conf on nodes n[6-10]
Name=gpu Type=v100s File=/path/to/file
Name=gpumemory Count=34359738368

# slurm.conf
GresTypes=gpu,gpumemory
NodeName=n[1-5] Gres=gpu:a100:1,gpumemory:42949672960
NodeName=n[6-10] Gres=gpu:v100s:1,gpumemory:34359738368




$ srun --gres=gpumemory:40G --gpus=1 printenv SLURMD_NODENAME
n1

$ srun --gres=gpumemory:32G --gpus=1 printenv SLURMD_NODENAME
n1

Notice that I was allocated node n1 both times. That's because --gres=gpumemory:32G will give me a node with *at least* that much available GRES. But if nodes n[1-5] are all busy, then I still get allocated an available node:

$ srun --gres=gpumemory:32G --gpus=1 printenv SLURMD_NODENAME
n6

WARNING: Slurm does NOT know about GPU memory. So even if you define a "gpumemory" gres, Slurm still doesn't know anything about GPU memory. Also, a user does not have to request the "gpumemory" gres to be allocated a GPU. If you want to enforce that, then you'll need to do it with a job_submit.lua script. Also, you will need to define enough "gpumemory" to cover all the GPUs on the node. Also, if a single node has different types of GPUs, then you are not guaranteed to be given the GPU that you want by requesting "gpumemory" - you will have to request the specific GPU type (such as a100 or v100s).

For these reasons, I recommend trying options (1) and (2) first. Users may not understand all the "gotchas" of option (3). But, it's up to you.

Comment 11 Xing Huang 2021-10-28 10:25:13 MDT

Thanks for your comment!
We have a quite complicated case.
Would the definition I made in gres.conf and slurm.conf work?
vmem are parameters to define gpu memory in the size of MB.
Looking forward to your help!

####### Define Gres in gres.conf #######
NodeName=gpu01 Name=gpu Count=4 File=/dev/nvidia[0-3] Type=tesla_a100
NodeName=gpu02 Name=gpu Count=4 File=/dev/nvidia[0-3] Type=tesla_v100S
NodeName=gpu03 Name=gpu Count=2 File=/dev/nvidia[0-1] Type=tesla_V100S
NodeName=gpu[04-05] Name=gpu Count=2 File=/dev/nvidia[0-1] Type=tesla_v100S
NodeName=gpu06 Name=gpu Count=4 File=/dev/nvidia[0-3] Type=tesla_v100
NodeName=gpu07 Name=gpu Count=3 File=/dev/nvidia[0-2] Type=tesla_v100
NodeName=gpu08 Name=gpu Count=2 File=/dev/nvidia[0-1] Type=tesla_t4
NodeName=gpu01 Name=vmem Count=40536
NodeName=gpu02 Name=vmem Count=32510
NodeName=gpu03 Name=vmem Count=32510
NodeName=gpu[04-05] Name=vmem Count=32510
NodeName=gpu06 Name=vmem Count=32510
NodeName=gpu07 Name=vmem Count=32510
NodeName=gpu08 Name=vmem Count=15109

####### Define Gres in slurm.conf #######
GresTypes=gpu,vmem
NodeName=gpu01 CoresPerSocket=16 RealMemory=385000 Sockets=2 Weight=1000 State=UNKNOWN Gres=gpu:tesla_a100:4,vmem:40536
NodeName=gpu02 CoresPerSocket=16 RealMemory=770000 Sockets=2 Weight=900 State=UNKNOWN Gres=gpu:tesla_v100S:4,vmem:32510
NodeName=gpu03 CoresPerSocket=16 RealMemory=770000 Sockets=2 Weight=900 State=UNKNOWN Gres=gpu:tesla_v100S:2,vmem:32510
NodeName=gpu[04-05] CoresPerSocket=16 RealMemory=385000 Sockets=2 Weight=800 State=UNKNOWN Gres=gpu:tesla_v100S:2,vmem:32510
NodeName=gpu06 CoresPerSocket=12 RealMemory=385000 Sockets=2 Weight=700 State=UNKNOWN Gres=gpu:tesla_v100:4,vmem:32510
NodeName=gpu07 CoresPerSocket=12 RealMemory=385000 Sockets=2 Weight=700 State=UNKNOWN Gres=gpu:tesla_v100:3:vmem:32510
NodeName=gpu08 CoresPerSocket=12 RealMemory=385000 Sockets=2 Weight=600 State=UNKNOWN Gres=gpu:tesla_t4:2,vmem:15109

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, October 27, 2021 4:49 PM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory


* External Email - Caution *

Comment # 10<https://bugs.schedmd.com/show_bug.cgi?id=12683#c10> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

(In reply to Xing Huang from comment #9<show_bug.cgi?id=12683#c9>)
> So is there a way to request gpu vram memory via slurm? What we are trying
> to achieve is to ask slurm give the job a gpu node with vram of 38g and
> hopefully slurm would smartly enough to figure out gpu node with a100 will
> be assigned to the job. Is this not possible in slurm?

It's not possible to do it that way because Slurm doesn't know about GPU
memory.

However, there are workarounds that can get you close. Here are a few options:

(1) Request the GPU type.

sbatch --gres=gpu:a100:<number_of_gpus> job.sh

I showed this in my last comment - if a user wants a specific type of GPU, they
should really just request that type.


(2) Use node features (https://slurm.schedmd.com/slurm.conf.html#OPT_Features
and https://slurm.schedmd.com/sbatch.html#OPT_constraint).

Maybe you have different types of GPUs that have the required GPU memory that
could satisfy a job request, and the user doesn't want to specify a specific
type of GPU. You could define node features describing what types of GPUs are
available on different sets of nodes.


NodeName=n[1-5] Features=gpu_32gb
NodeName=n[6-10] Features=gpu_64gb
NodeName=n[11-15] Features=gpu_40gb

Then you could request a node with at least 40 GB of GPU memory:

sbatch --constraint=gpu_40gb|gpu_64gb --gpus=2 job.sh


You may want to define your node features differently, but I hope this example
gives you an idea of how you can use node features.


(3) Make gpu memory another gres.

You can define anything as a GRES in gres.conf. You could define "gpumemory" as
a gres, and then a job could request --gres=gpumemory:40G.

Example:

# gres.conf on nodes n[1-5]
Name=gpu Type=a100 File=/path/to/file
Name=gpumemory Count=42949672960

# gres.conf on nodes n[6-10]
Name=gpu Type=v100s File=/path/to/file
Name=gpumemory Count=34359738368

# slurm.conf
GresTypes=gpu,gpumemory
NodeName=n[1-5] Gres=gpu:a100:1,gpumemory:42949672960
NodeName=n[6-10] Gres=gpu:v100s:1,gpumemory:34359738368




$ srun --gres=gpumemory:40G --gpus=1 printenv SLURMD_NODENAME
n1

$ srun --gres=gpumemory:32G --gpus=1 printenv SLURMD_NODENAME
n1

Notice that I was allocated node n1 both times. That's because
--gres=gpumemory:32G will give me a node with *at least* that much available
GRES. But if nodes n[1-5] are all busy, then I still get allocated an available
node:

$ srun --gres=gpumemory:32G --gpus=1 printenv SLURMD_NODENAME
n6

WARNING: Slurm does NOT know about GPU memory. So even if you define a
"gpumemory" gres, Slurm still doesn't know anything about GPU memory. Also, a
user does not have to request the "gpumemory" gres to be allocated a GPU. If
you want to enforce that, then you'll need to do it with a job_submit.lua
script. Also, you will need to define enough "gpumemory" to cover all the GPUs
on the node. Also, if a single node has different types of GPUs, then you are
not guaranteed to be given the GPU that you want by requesting "gpumemory" -
you will have to request the specific GPU type (such as a100 or v100s).

For these reasons, I recommend trying options (1) and (2) first. Users may not
understand all the "gotchas" of option (3). But, it's up to you.

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 12 Marshall Garey 2021-10-28 15:36:47 MDT

Is the value you set for vmem the total GPU memory for *all* GPUs on the node? Or just the amount of GPU memory for one GPU?

If you want users to request the total amount of GPU memory that they want, then you should define it as the total amount of GPU memory on the node, not the amount of GPU memory for a single GPU.

I just want to repeat my warning in my last comment:

WARNING: Slurm does NOT know about GPU memory. So even if you define a "gpumemory" gres, Slurm still doesn't know anything about GPU memory. Also, a user does not have to request the "gpumemory" gres to be allocated a GPU. If you want to enforce that, then you'll need to do it with a job_submit.lua script. Also, you will need to define enough "gpumemory" to cover all the GPUs on the node. Also, if a single node has different types of GPUs, then you are not guaranteed to be given the GPU that you want by requesting "gpumemory" - you will have to request the specific GPU type (such as a100 or v100s).

Another warning: Outside of MPS (https://slurm.schedmd.com/gres.html#MPS_Management) and MIG (https://slurm.schedmd.com/gres.html#MIG_Management), GPUs can NOT be shared. And even with MPS or MIG, it doesn't really make sense to have a user request a subset of the GPU's memory.

What is your use case for wanting to request the exact amount of GPU memory?

Is there a reason that you don't want to try my first and second recommendations in my last comment? If a user wants a specific type of GPU then they should request that type. If they are okay with some types of GPUs but not others, then node features works really well.

Comment 13 Xing Huang 2021-10-28 15:39:36 MDT

Marshall,

The reason is that we are already using lua script to handle jobs. Method 3 is much simpler for us compared to method 2. I just did a test run and my way of implementing your 3rd suggestion worked.
Thanks for your advice and you can close the ticket now.

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Thursday, October 28, 2021 4:36 PM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory


* External Email - Caution *

Comment # 12<https://bugs.schedmd.com/show_bug.cgi?id=12683#c12> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

Is the value you set for vmem the total GPU memory for *all* GPUs on the node?
Or just the amount of GPU memory for one GPU?

If you want users to request the total amount of GPU memory that they want,
then you should define it as the total amount of GPU memory on the node, not
the amount of GPU memory for a single GPU.

I just want to repeat my warning in my last comment:

WARNING: Slurm does NOT know about GPU memory. So even if you define a
"gpumemory" gres, Slurm still doesn't know anything about GPU memory. Also, a
user does not have to request the "gpumemory" gres to be allocated a GPU. If
you want to enforce that, then you'll need to do it with a job_submit.lua
script. Also, you will need to define enough "gpumemory" to cover all the GPUs
on the node. Also, if a single node has different types of GPUs, then you are
not guaranteed to be given the GPU that you want by requesting "gpumemory" -
you will have to request the specific GPU type (such as a100 or v100s).

Another warning: Outside of MPS
(https://slurm.schedmd.com/gres.html#MPS_Management) and MIG
(https://slurm.schedmd.com/gres.html#MIG_Management), GPUs can NOT be shared.
And even with MPS or MIG, it doesn't really make sense to have a user request a
subset of the GPU's memory.

What is your use case for wanting to request the exact amount of GPU memory?

Is there a reason that you don't want to try my first and second
recommendations in my last comment? If a user wants a specific type of GPU then
they should request that type. If they are okay with some types of GPUs but not
others, then node features works really well.

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 14 Marshall Garey 2021-10-28 15:53:59 MDT

Okay, just be really careful with this. I really want to stress my warnings about this method.

Another potential problem that I thought of:

What if a user requests 16 GB of vram with the new GRES, but then they get allocated a GPU with 32 GB of vram? Then if another user requests 32 GB of vram then they won't be able to be allocated to the other GPU on that node because the vram GRES isn't available on that node.

You can also enforce node features with the job_submit/lua plugin, so I don't see why already using a job_submit/lua plugin locks you out of that method.

In short, we definitely recommend using node features first, but it's up to you.

I'm afraid that users will think that they can request a subset of GPU memory when they actually can't. In addition, if you open a ticket on this later another support engineer might be confused why you aren't using node features instead.

For now I'll close this as infogiven.

Comment 15 Xing Huang 2021-10-28 16:00:47 MDT

Marshall,

This is a very good warning. We will definitely watch on this.
Currently, we are using priority to deal with the situation you are mentioning. GPU with higher memory would be less likely allocated to user than GPU with lower memory. This would be negated if user specifically request a GPU with higher memory.

Best,
Xing
________________________________
From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Thursday, October 28, 2021 4:53 PM
To: Huang, Xing <x.huang@wustl.edu>
Subject: [Bug 12683] bug with requesting gpu memory


* External Email - Caution *

Comment # 14<https://bugs.schedmd.com/show_bug.cgi?id=12683#c14> on bug 12683<https://bugs.schedmd.com/show_bug.cgi?id=12683> from Marshall Garey<mailto:marshall@schedmd.com>

Okay, just be really careful with this. I really want to stress my warnings
about this method.

Another potential problem that I thought of:

What if a user requests 16 GB of vram with the new GRES, but then they get
allocated a GPU with 32 GB of vram? Then if another user requests 32 GB of vram
then they won't be able to be allocated to the other GPU on that node because
the vram GRES isn't available on that node.

You can also enforce node features with the job_submit/lua plugin, so I don't
see why already using a job_submit/lua plugin locks you out of that method.

In short, we definitely recommend using node features first, but it's up to
you.

I'm afraid that users will think that they can request a subset of GPU memory
when they actually can't. In addition, if you open a ticket on this later
another support engineer might be confused why you aren't using node features
instead.

For now I'll close this as infogiven.

________________________________
You are receiving this mail because:

  *   You reported the bug.

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Comment 16 Marshall Garey 2021-10-28 16:05:10 MDT

Sounds good. Closing this now