6249 – mpirun within srun interactive job requesting gres

Ticket 6249 - mpirun within srun interactive job requesting gres

Summary: mpirun within srun interactive job requesting gres

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Other (show other tickets)
Version:	18.08.3
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Nate Rini
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-12-17 12:00 MST by rjepperson
Modified:	2018-12-18 14:19 MST (History)
CC List:	0 users

See Also:
Site:	KU
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	RHEL
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description rjepperson 2018-12-17 12:00:25 MST

Greetings,

I believe the problem we are having is a duplicate of https://bugs.schedmd.com/show_bug.cgi?id=5816, but I am unsure on how to solve it for our particular case.

A user is submitting an interactive job requesting 4 gpus on a single node with 2+ tasks:
srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l

They are then wanting to run an mpi job to run across the 2 cpus which are allocated to the job and the 4 gpus which have been assigned to them, but when they go to run "mpirun <application name>", it just hangs. I tried with Intel MPI benchmarks to make sure it wasn't something with their code.

mpirun /panfs/pfs.local/software/install/intel/2017.4/impi/2017.3.196/bin64/IMB-MPI1

This behavior only happens when requesting an interactive job using the "--gres" option. I assume because the mpirun command is actually using srun, and that srun is also trying to request 4 gpus on the same node and they are allocated to the bash step, so therefore it won't ever run. Same thing happens when I run a simple "srun hostname". "srun --gres=none hostname" works, but that doesn't solve this problem, it just points to the actual problem.

How do I solve this though? I read other bugs, where developers suggest using a "--gres:gpu=0" for the interactive job and then request "--gres:gpu=4" for the actual "srun" step, but how does that work if the user is just using "mpirun"? Also, that would mean another batch job could theoretically use the 4 gpus on the node where they have their interactive job while they may be working on their code.

We set "DefMemPerCPU=2048" on the partition requested as well, so it is not a memory issue.

Thank you,

Riley

Comment 1 Nate Rini 2018-12-18 09:16:26 MST

Riley

(In reply to rjepperson from comment #0)
> A user is submitting an interactive job requesting 4 gpus on a single node
> with 2+ tasks:
> srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l
Are they also calling salloc before calling srun?

--Nate

Comment 3 rjepperson 2018-12-18 09:34:22 MST

No, they are not running salloc. I assumed salloc was just a sort of wrapper for the SallocDefaultCommand which is srub.

Riley


-------- Original message --------
From: bugs@schedmd.com
Date: 12/18/18 10:16 AM (GMT-06:00)
To: "Epperson, Riley J." <rjepperson@ku.edu>
Subject: [Bug 6249] mpirun within srun interactive job requesting gres


Comment # 1<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c1&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=5XRgRoIfvarZPJGY95plULrIQNPWZ%2BfgmYv7aDivxDE%3D&reserved=0> on bug 6249<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=9f8gBluCNiHFq7g5YA07fEzujqlasEE9Na2jD%2FS9Cog%3D&reserved=0> from Nate Rini<mailto:nate@schedmd.com>

Riley

(In reply to rjepperson from comment #0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c0&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915196861&sdata=yWapTa3rTxvywPV8NiTc9yufUTi2ABJGKubtzS8l8Kg%3D&reserved=0>)
> A user is submitting an interactive job requesting 4 gpus on a single node
> with 2+ tasks:
> srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l
Are they also calling salloc before calling srun?

--Nate

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 4 Nate Rini 2018-12-18 09:52:35 MST

(In reply to rjepperson from comment #3)
> No, they are not running salloc. I assumed salloc was just a sort of wrapper
> for the SallocDefaultCommand which is srub.
Salloc allows user to allocate resources for an interactive job and then use steps (srun) to run their programs.

> How do I solve this though? I read other bugs, where developers suggest
> using a "--gres:gpu=0" for the interactive job and then request
> "--gres:gpu=4" for the actual "srun" step, but how does that work if the
> user is just using "mpirun"? 

Here is how to stop a step from using gres: (from man srun)
>By default, a job step is allocated all of the generic resources that have allocated to the job. To change the behavior so that each job step is allocated no generic resources, explicitly set the value of --gres to specify zero counts for each generic resource OR set "--gres=none" OR set the SLURM_STEP_GRES environment variable to "none".

You could try using salloc:
> salloc --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 /bin/bash -l
> srun --gres=gpu:4 /path/to/gpu/job &
> env SLURM_STEP_GRES=none mpirun /path/to/non-gpu/job

--Nate

Comment 5 rjepperson 2018-12-18 14:18:01 MST

Great. Thank you. That will work. We'll just have to work on conveying the differences between srun and salloc to the users.

You may close this ticket.

Riley

Comment 6 Nate Rini 2018-12-18 14:19:55 MST

Closing ticket per your response.

--Nate