Ticket 279 - Add ability to specify different GPU types
Summary: Add ability to specify different GPU types
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 2.5.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2013-05-02 10:48 MDT by Moe Jette
Modified: 2014-04-14 09:20 MDT (History)
1 user (show)

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.11.0-pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Moe Jette 2013-05-02 10:48:37 MDT
User should be able to specify type of the GPU which is needed for his job. For example, "sbatch --gres:gpu_gtx560:1 ./job.sh". But then SLURM will not set CUDA_VISIBLE_DEVICES=<correct GPU id>, because gpu plugin is not used in this case.

If user specify now "sbatch --gres:gpu:1 ..", then CUDA_VISIBLE_DEVICES will be set to the first free GPU, which could be a GPU which is unsupported by the job.

I belive the only true way here is to map GPU device and some string which denotes a GPU type. For example, in gres.conf:

Name=gpu Type=gtx560 File=/dev/nvidia0
Name=gpu Type=gtx560 File=/dev/nvidia1
Name=gpu Type=tesla File=/dev/nvidia2

And then user will request particular GPU as follows:

sbatch --gres=gpu:tesla:1 ./my.job

and then SLURM gpu plugin will set CUDA_VISIBLE_DEVICES=2 for this job.
Comment 1 Moe Jette 2014-04-08 06:22:59 MDT
I'm going to get started on this with an eye toward adding some other GRES enhancements requested by customers.
Comment 2 Moe Jette 2014-04-14 09:20:41 MDT
The logic to provide this functionality is now in v14.11 with a multitude of commits.