Ticket 5816 - cannot run srun whilst allocated GPU resources via gres
Summary: cannot run srun whilst allocated GPU resources via gres
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-10-05 02:37 MDT by BBP Administrator
Modified: 2018-10-11 09:11 MDT (History)
0 users

See Also:
Site: EPFL
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
gres.conf (51 bytes, text/plain)
2018-10-08 01:34 MDT, BBP Administrator
Details
slurm.conf (7.90 KB, text/plain)
2018-10-08 01:35 MDT, BBP Administrator
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description BBP Administrator 2018-10-05 02:37:18 MDT
Hello
We are a new customer to slurm, our site is not yet included in the dropdown list. We are the Blue Brain Project (BBP) from Ecole polytechnic federal de lausanne (EPFL).

We have a user who wishes to allocate a GPU via salloc, and then from the allocation run further code via srun that utilises the GPU.

The following results in srun hanging

[morrice@bbpv1 ~]$ salloc --account=proj16 -p interactive -C volta --gres=gpu:1
[morrice@r2i3n0 ~]$ echo $CUDA_VISIBLE_DEVICES
0
[morrice@r2i3n0 ~]$ srun hostname
srun: Job step creation temporarily disabled, retrying

If I run the following (from the same allocation) it works;

[morrice@r2i3n0 ~]$ srun --gres=none hostname
r2i3n0

For info the default salloc command is:

[morrice@r2i3n0 ~]$ grep Sallo /etc/slurm/slurm.conf
SallocDefaultCommand="/usr/bin/srun -n1 -N1 --propagate=ALL --pty --preserve-env --mem-per-cpu=0 --mpi=pmi2 $SHELL -l"

We are running slurm-17.02.9-1.el7.x86_64

Am I misunderstanding something here - is it possible to have access to GPU resources via srun through an salloc ?
Comment 1 Marshall Garey 2018-10-05 14:25:45 MDT
I believe that should work. I need to look into this more. Can you also upload your slurm.conf and gres.conf so I can use it and try to reproduce what you're seeing?
Comment 2 BBP Administrator 2018-10-08 01:34:41 MDT
Created attachment 7969 [details]
gres.conf
Comment 3 BBP Administrator 2018-10-08 01:35:16 MDT
Created attachment 7970 [details]
slurm.conf
Comment 4 BBP Administrator 2018-10-08 01:36:56 MDT
Thanks Marshall - i've added our slurm.conf and gres.conf as attachments to this ticket.
Comment 10 Marshall Garey 2018-10-09 16:53:41 MDT
I can reproduce what you're seeing, and I believe I have the solution. It's basically a duplicate of bug 5543.

Add --gres=none to your salloc default command. Then, for salloc jobs that want GRES, they can override that. It works as expected for me.

So, with your revised SallocDefaultCommand:

SallocDefaultCommand="/usr/bin/srun -n1 -N1 --propagate=ALL --pty --preserve-env --mem-per-cpu=0 --mpi=pmi2 --gres=none $SHELL -l"

marshall@voyager:~/slurm/18.08/voyager$ salloc --gres=gpu:1
salloc: Granted job allocation 15075
salloc: Waiting for resource configuration
salloc: Nodes v1 are ready for job

marshall@voyager:~/slurm/18.08/voyager$ env|grep -i cuda
marshall@voyager:~/slurm/18.08/voyager$ srun env|grep -i cuda
CUDA_VISIBLE_DEVICES=0
marshall@voyager:~/slurm/18.08/voyager$ srun --gres=gpu:none env|grep -i cuda
srun: error: Unable to create step for job 15075: Invalid Trackable RESource (TRES) specification
marshall@voyager:~/slurm/18.08/voyager$ srun --gres=gpu:0 env|grep -i cuda
marshall@voyager:~/slurm/18.08/voyager$ srun --gres=none env|grep -i cuda


Can you verify that this fixes it for you?

Assuming this works for you, we might want to consider modifying to our documentation to recommend adding --gres=none to SallocDefault if GRES are used.
Comment 11 BBP Administrator 2018-10-11 06:42:17 MDT
Hello Marshall,

Thank-you for the information. Your suggestion has helped me come to a solution.

In our case, our users would like to have CUDA_VISIBLE_DEVICES available via salloc AND srun.

My solution is to:
- add --gres=gpu:0 to the default salloc command
- add logic to slurm.prolog to populate a file /tmp/${SLURM_JOB_USER}_${SLURM_JOB_ID}_CUDA with $SLURM_JOB_GPUS (if the variable is not null)
- add logic to slurm.epilog to remove the above file
- add logic to slurm.taskprolog to set CUDA_VISIBLE_DEVICES with the contents of /tmp/${USER}_${SLURM_JOB_ID}_CUDA

The end result is something similar to the following:

[morrice@bbpv1 ~]$ salloc  --account proj14 -p interactive -C volta --gres=gpu:1
salloc: Granted job allocation 182476
salloc: Waiting for resource configuration
salloc: Nodes r2i3n0 are ready for job

[morrice@r2i3n0 ~]$ printenv |grep CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=1
[morrice@r2i3n0 ~]$ srun printenv |grep CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=1
[morrice@r2i3n0 ~]$

Thank-you for your assistance - you may close this ticket.
Comment 12 Marshall Garey 2018-10-11 09:11:01 MDT
Great.

You may also want to check if you're constraining devices in cgroup.conf, and if that's something you want to do or not.

Closing as resolved/infogiven.