Greetings, I believe the problem we are having is a duplicate of https://bugs.schedmd.com/show_bug.cgi?id=5816, but I am unsure on how to solve it for our particular case. A user is submitting an interactive job requesting 4 gpus on a single node with 2+ tasks: srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l They are then wanting to run an mpi job to run across the 2 cpus which are allocated to the job and the 4 gpus which have been assigned to them, but when they go to run "mpirun <application name>", it just hangs. I tried with Intel MPI benchmarks to make sure it wasn't something with their code. mpirun /panfs/pfs.local/software/install/intel/2017.4/impi/2017.3.196/bin64/IMB-MPI1 This behavior only happens when requesting an interactive job using the "--gres" option. I assume because the mpirun command is actually using srun, and that srun is also trying to request 4 gpus on the same node and they are allocated to the bash step, so therefore it won't ever run. Same thing happens when I run a simple "srun hostname". "srun --gres=none hostname" works, but that doesn't solve this problem, it just points to the actual problem. How do I solve this though? I read other bugs, where developers suggest using a "--gres:gpu=0" for the interactive job and then request "--gres:gpu=4" for the actual "srun" step, but how does that work if the user is just using "mpirun"? Also, that would mean another batch job could theoretically use the 4 gpus on the node where they have their interactive job while they may be working on their code. We set "DefMemPerCPU=2048" on the partition requested as well, so it is not a memory issue. Thank you, Riley
Riley (In reply to rjepperson from comment #0) > A user is submitting an interactive job requesting 4 gpus on a single node > with 2+ tasks: > srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l Are they also calling salloc before calling srun? --Nate
No, they are not running salloc. I assumed salloc was just a sort of wrapper for the SallocDefaultCommand which is srub. Riley -------- Original message -------- From: bugs@schedmd.com Date: 12/18/18 10:16 AM (GMT-06:00) To: "Epperson, Riley J." <rjepperson@ku.edu> Subject: [Bug 6249] mpirun within srun interactive job requesting gres Comment # 1<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c1&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=5XRgRoIfvarZPJGY95plULrIQNPWZ%2BfgmYv7aDivxDE%3D&reserved=0> on bug 6249<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=9f8gBluCNiHFq7g5YA07fEzujqlasEE9Na2jD%2FS9Cog%3D&reserved=0> from Nate Rini<mailto:nate@schedmd.com> Riley (In reply to rjepperson from comment #0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c0&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915196861&sdata=yWapTa3rTxvywPV8NiTc9yufUTi2ABJGKubtzS8l8Kg%3D&reserved=0>) > A user is submitting an interactive job requesting 4 gpus on a single node > with 2+ tasks: > srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l Are they also calling salloc before calling srun? --Nate ________________________________ You are receiving this mail because: * You reported the bug.
(In reply to rjepperson from comment #3) > No, they are not running salloc. I assumed salloc was just a sort of wrapper > for the SallocDefaultCommand which is srub. Salloc allows user to allocate resources for an interactive job and then use steps (srun) to run their programs. > How do I solve this though? I read other bugs, where developers suggest > using a "--gres:gpu=0" for the interactive job and then request > "--gres:gpu=4" for the actual "srun" step, but how does that work if the > user is just using "mpirun"? Here is how to stop a step from using gres: (from man srun) >By default, a job step is allocated all of the generic resources that have allocated to the job. To change the behavior so that each job step is allocated no generic resources, explicitly set the value of --gres to specify zero counts for each generic resource OR set "--gres=none" OR set the SLURM_STEP_GRES environment variable to "none". You could try using salloc: > salloc --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 /bin/bash -l > srun --gres=gpu:4 /path/to/gpu/job & > env SLURM_STEP_GRES=none mpirun /path/to/non-gpu/job --Nate
Great. Thank you. That will work. We'll just have to work on conveying the differences between srun and salloc to the users. You may close this ticket. Riley
Closing ticket per your response. --Nate