Ticket 6412

Summary:	CUDA_VISIBLE_DEVICES=NoDevFiles isn't set when submitting a job without --gres option
Product:	Slurm	Reporter:	Koji Tanaka <it-hpc>
Component:	Configuration	Assignee:	Felip Moll <felip.moll>
Status:	RESOLVED FIXED	QA Contact:
Severity:	3 - Medium Impact
Priority:	---	CC:	albert.gil, bart, felip.moll
Version:	18.08.4
Hardware:	Linux
OS:	Linux
See Also:	https://bugs.schedmd.com/show_bug.cgi?id=6538
Site:	OIST	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	19.05
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Koji Tanaka 2019-01-28 01:36:13 MST

Hello,

We installed slurm 18.08.4 on a cluster that has some GPU compute nodes, and we found that the CUDA_VISIBLE_DEVICES isn't set as "NoDevFiles" when submitting a job without --gres option.

# --gres=gpu:1 --> OK
$ srun -p gpu -w saion-gpu07 --gres=gpu:1 sh -c 'env |grep CUDA'
CUDA_VISIBLE_DEVICES=0

# --gres=gpu:2 --> OK
$ srun -p gpu -w saion-gpu07 --gres=gpu:2 sh -c 'env |grep CUDA'
CUDA_VISIBLE_DEVICES=0,1

# Without --gres --> Not OK
$ srun -p gpu -w saion-gpu07 sh -c 'env |grep CUDA'
srun: error: saion-gpu07: task 0: Exited with exit code 1

The CUDA_VISIBLE_DEVICES should be "NoDevFiles" without "--gres", am I right? Could you help me solve this problem?

Thank you,
Koji

Comment 1 Felip Moll 2019-01-28 05:35:54 MST

Hi,

Your user/e-mail doesn't appear in our supported users list.
For your site OIST, I only have Tim Dyce. Can you please confirm
that you are an authorized user?

Aside from this, I am lowering this issue to sev.3, please, take a look
to our severity policy: https://www.schedmd.com/support.php

Regarding to your question:

If cgroups is enabled and you've set ConstrainDevices=yes the devices
associated with the configured gres of the node that are not requested
on a job will be restricted, so the gpus not requested won't be accessible.

Can you try this?:

$ srun -p gpu -w saion-gpu07 --gres=gpu:1 nvidia-smi

$ srun -p gpu -w saion-gpu07 --gres=gpu:2 nvidia-smi

$ srun -p gpu -w saion-gpu07 nvidia-smi

If you are not using cgroups (we do not recommend this), then yes, it seems
NoDevFiles should be set when no gres requested.

As a temporary workaround you could set a prolog that change CUDA_VISIBLE_DEVICES to
NoDevFiles and then, if some GRES are requested, the stepd will change the enviromnent
accordingly with the resources that are being allocated.

In any case we recommend using cgroups if you want any real enforcing mechanism
for accessing devices. Relying only on environment variables is not safe.

I will look at that and come back to you when I find/fix the issue.

Comment 5 Koji Tanaka 2019-01-28 23:45:36 MST

Hi Felip,

Thank you for answering my question.

Regarding the supported user list, I'll talk to Tim (Tim Dyce) and try to find a way to update the list.

Right now, the cgroups is not enabled on the SLURM, so the nvidia-smi command shows all the gpus with or without --gres option. I discussed with my team, and we decided to work on configuring the cgroups, as you recommended.

Bests,
Koji

Comment 6 Felip Moll 2019-01-29 06:34:44 MST

(In reply to Koji Tanaka from comment #5)
> Hi Felip,
> 
> Thank you for answering my question.
> 
> Regarding the supported user list, I'll talk to Tim (Tim Dyce) and try to
> find a way to update the list.

Great, I guess Tim D. just need to talk with Jacob.

> Right now, the cgroups is not enabled on the SLURM, so the nvidia-smi
> command shows all the gpus with or without --gres option. I discussed with
> my team, and we decided to work on configuring the cgroups, as you
> recommended.
> 
> Bests,
> Koji

That's good. Just tell me if you need any help, but basically you must set
at least the following parameters:

slurm.conf:
TaskPlugin=task/cgroup

cgroup.conf:
ConstrainDevices=yes


(note I am showing the minimum, depending on your config you may want to add/keep other task plugins as well like task/affinity, or add more to
your cgroup.conf)


I will continue working to fix this issue.

Comment 7 Koji Tanaka 2019-01-29 22:39:10 MST

Hi Filip,

Thank you for providing the necessary parameters. The cgroups works fine now. I also added these two items in cgroup.conf:

ConstrainCores=yes
ConstrainRAMSpace=yes

I also keep task/affinity in slurm.conf as shown below:

TaskPlugin=task/cgroup,task/affinity

If my understanding is correct, when the task/cgroup plugin doesn't work on a compute node, Slurm would go with task/affinity. Am I right?

I'm still learning the cgroup setup, and I'll probably add the follows later.

ProctrackType=proctrack/cgroup
JobAcctGatherType=jobacct_gather/cgroup

And also, regarding the supported user list, we got a response from Jess@SchedMD saying that the list is updated now.

Thanks a lot.

Bests,
Koji

Comment 8 Felip Moll 2019-01-30 02:17:55 MST

(In reply to Koji Tanaka from comment #7)
> Hi Filip,
> 
> Thank you for providing the necessary parameters. The cgroups works fine
> now. I also added these two items in cgroup.conf:
> 
> ConstrainCores=yes
> ConstrainRAMSpace=yes
> 
> I also keep task/affinity in slurm.conf as shown below:
> 
> TaskPlugin=task/cgroup,task/affinity
> 
> If my understanding is correct, when the task/cgroup plugin doesn't work on
> a compute node, Slurm would go with task/affinity. Am I right?
> 

No, setting two plugins here doesn't work as a failback mechanism.
Instead, we stack both plugins to achieve that cgroup constrain the cores and ram to a job,
and affinity controls the binding of processes to cpus. We recommend this because
task/affinity is better controlling the affinity compared on task/cgroup.

One could just want to use task/cgroup affinity control disabling task/affinity and setting
TaskAffinity=yes in cgroup.conf

So, cgroup must work on all the nodes.

> I'm still learning the cgroup setup, and I'll probably add the follows later.
> 
> ProctrackType=proctrack/cgroup
> JobAcctGatherType=jobacct_gather/cgroup
> 

Proctrack cgroup is highly recommended.

But with JobAcctGatherType I must say it probably won't provide you with notable benefits.
What changes from using jobacct_gather/linux to cgroup is mainly that memory is computed from
the cgroup instead from the /proc/<pid>/statm file.


> And also, regarding the supported user list, we got a response from
> Jess@SchedMD saying that the list is updated now.

That's great.

Comment 9 Koji Tanaka 2019-02-01 00:27:05 MST

Hi Felip,

I added TaskAffinity=yes in cgroup.conf. 

Thanks a lot,
Koji

Comment 10 Felip Moll 2019-02-01 08:18:32 MST

(In reply to Koji Tanaka from comment #9)
> Hi Felip,
> 
> I added TaskAffinity=yes in cgroup.conf. 
> 
> Thanks a lot,
> Koji

Hi,

I think my explanation was not clear enough. I recommend setting the first option below:

Recommended configuration:
-------------------------------
slurm.conf:
  TaskPlugin=task/cgroup,task/affinity
  ProctrackType=proctrack/cgroup
  JobAcctGatherType=jobacct_gather/linux

cgroup.conf:
  ConstrainCores=yes
  ConstrainRAMSpace=yes


Other valid configuration, but not recommended
----------------------------------------------
This makes the cgroup plugin to take affinity control:

slurm.conf:
  TaskPlugin=task/cgroup
  ProctrackType=proctrack/cgroup
  JobAcctGatherType=jobacct_gather/linux

cgroup.conf:
  ConstrainCores=yes
  ConstrainRAMSpace=yes
  TaskAffinity=yes


A non-valid configuration:
----------------------------------------------
Who takes control of affinity here? task/affinity or task/cgroup? Error.

slurm.conf:
  TaskPlugin=task/cgroup,task/affinity
  ProctrackType=proctrack/cgroup
  JobAcctGatherType=jobacct_gather/linux

cgroup.conf:
  ConstrainCores=yes
  ConstrainRAMSpace=yes
  TaskAffinity=yes

Comment 18 Felip Moll 2019-02-22 04:08:23 MST

Hi Koji,

can you tell me if the suggested configs are working well for you?

I just wanted to add a reference to a tip here that I think is worth to read:

https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

It just states that CUDA_VISIBLE_DEVICES should be used only for testing
and debugging, is not an effective way to restrict devices to a job.

Comment 19 Koji Tanaka 2019-02-24 17:31:03 MST

Hi Felip,

Sorry for not providing any update. I decided to go with your recommended configurati0n, and it's been working well.

And thanks also for sharing the link from Nvidia.

Thanks a lot!

Bests,
Koji

Comment 43 Felip Moll 2019-04-02 09:03:26 MDT

It has been decided internally to not set NoDevFiles variable anymore.

We've also fixed an issue on environment setup calls.

Commits are:

d618ade0e6f5a85fe720bef3069dda70e51062b7
e564677590441f0e25ee90101055c274f13eabf6
eec070ba5458738c8f35f381212a40fcb687fb68

Fixed in future release 19.05.

Comment 44 Felip Moll 2019-04-02 09:06:21 MDT

Closing issue as fixed.