| Summary: | How to pack jobs to make use of partial number of gpus in the node? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Alexis <Alexis.Espinosa> |
| Component: | Scheduling | Assignee: | Director of Support <support> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | kevin.buckley |
| Version: | 20.02.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Pawsey | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | CentOS |
| Machine Name: | topaz | CLE Version: | |
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Alexis
2021-01-19 00:09:22 MST
--gpu-bind=single is new to 20.11, so that's why it's not working for you in 20.02. Also, --gpu-bind=single has a simplified placement algorithm that doesn't work as well with multiple sockets, so it still might not fit your needs, depending on your CPU topology and CPU-GPU affinity layout. 1) Yes. As an alternative to --gpu-bind=single, you could use the map_gpu or mask_gpu options for --gpu-bind, and that should let you precisely match which GPU(s) you want to be accessible to which task. 2) I'm not quite sure what you are trying to do with the wrapper file. What is wrong with how your batch script is set up? I would just add `--gpu-bind=...` to each of those sruns. When using --gpu-bind, it should be noted that it simply sets CUDA_VISIBLE_DEVICES (You could even skip the --gpu-bind argument and set CUDA_VISIBLE_DEVICES yourself for the task to get similar results). So that's the env that you want to check in the task when seeing if --gpu-bind is working as expected. Thanks a lot Michael. Your suggestion: ``` #SBATCH --gpu-bind=map_gpu:0,1,2,3 . . . srun ./wrapper.sh ``` is working great! Now, I have a further question for future use: 1) What if I want to set two gpus per task? Can I still use `map_gpu` or any other slurm option? I tried this but failed: ``` #SBATCH --gpu-bind=map_gpu:0-1,2-3 . . . srun ./wrapper.sh ``` 2) Or should I then come back to the native use of `CUDA_VISIBLE_DEVICES=0,1` settings? Thank you very much, Alexis Glad it's working great! (In reply to Alexis from comment #2) > Now, I have a further question for future use: > 1) What if I want to set two gpus per task? Can I still use `map_gpu` or any > other slurm option? > > I tried this but failed: > ``` > #SBATCH --gpu-bind=map_gpu:0-1,2-3 > . > . > . > srun ./wrapper.sh > ``` > > 2) Or should I then come back to the native use of > `CUDA_VISIBLE_DEVICES=0,1` settings? So what you are looking for is mask_gpu. map_gpu only supports one GPU per task, as a convenience. If you use mask_gpu, however, you can get multiple GPUs per task, as long as you set the mask to cover multiple bits. So to get task 0 to use GPUs 0-1, and task 1 to use GPUs 2-3, do something like this: #SBATCH --gpu-bind=map_gpu:0x3,0xC That creates binary masks 0011 (0x3) and 1100 (0xC) (assuming 4 total GPUs). Let me know if that works for you. Thanks! -Michael Thanks a lot Michael, Yes, that is working perfectly. Just a suggestion here: 1) could you add the chance to use a binary mask? I think, it is much more clear to use the binary masks: 1100,0011 than the use of the hexadecimal numbered masks. And I have a final question related to all this (final I think): 2) When following the path of defining the CUDA_VISIBLE_DEVICES, I need to use a variable that tells me how many GPUs are originally available (allocated) per node. I would like to use SLURM_GPUS_PER_NODE, but that variable is not set because I did not set it with: ``` #SBATCH --gpus-per-node=4 ``` in the header. What I'm using to assign the gpus per node to the job is: ``` #SBATCH --gres=gpu:4 ``` but then, What is the variable that keeps that number? How can I query that number in the wrapper? Thanks a lot, Alexis (In reply to Alexis from comment #4) > Thanks a lot Michael, > > Yes, that is working perfectly. Just a suggestion here: > > 1) could you add the chance to use a binary mask? I think, it is much more > clear to use the binary masks: 1100,0011 than the use of the hexadecimal > numbered masks. It could make sense to have gpu_mask use a binary mask, especially since most nodes have 4 or less GPUs. But that would require a separate enhancement request ticket, and likely wouldn't get looked at unless it was sponsored, since we already have a lot on our plates. We are also open to contributions, if your team wanted to develop this. > And I have a final question related to all this (final I think): > > 2) When following the path of defining the CUDA_VISIBLE_DEVICES, I need to > use a variable that tells me how many GPUs are originally available > (allocated) per node. I would like to use SLURM_GPUS_PER_NODE, but that > variable is not set because I did not set it with: > ``` > #SBATCH --gpus-per-node=4 > ``` > in the header. > What I'm using to assign the gpus per node to the job is: > ``` > #SBATCH --gres=gpu:4 > ``` > but then, What is the variable that keeps that number? How can I query that > number in the wrapper? Why do you need to query it? You already know it's guaranteed to be 4 GPUs. Just pass that number into your wrapper script as an argument or through an env var. However, if you want to know what GPU IDs you are allocated, take a look at SLURM_JOB_GPUS, SLURM_STEP_GPUS, and CUDA_VISIBLE_DEVICES or GPU_DEVICE_ORDINAL env vars. The first two should tell you which global GPU IDs are used for the job or step, respectively, while the last two (which are the same) tell you the local GPU IDs within the current cgroup. Thanks! -Michael Thank you very much Michael, You have been very helpful! I think we are done with this. (In reply to Alexis from comment #6) > Thank you very much Michael, > > You have been very helpful! > > I think we are done with this. Excellent! Glad I could help :) I'll go ahead and close this out, then. -Michael |