| Summary: | mpirun within srun interactive job requesting gres | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | rjepperson |
| Component: | Other | Assignee: | Nate Rini <nate> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 18.08.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | KU | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | RHEL | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
rjepperson
2018-12-17 12:00:25 MST
Riley (In reply to rjepperson from comment #0) > A user is submitting an interactive job requesting 4 gpus on a single node > with 2+ tasks: > srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l Are they also calling salloc before calling srun? --Nate No, they are not running salloc. I assumed salloc was just a sort of wrapper for the SallocDefaultCommand which is srub. Riley -------- Original message -------- From: bugs@schedmd.com Date: 12/18/18 10:16 AM (GMT-06:00) To: "Epperson, Riley J." <rjepperson@ku.edu> Subject: [Bug 6249] mpirun within srun interactive job requesting gres Comment # 1<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c1&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=5XRgRoIfvarZPJGY95plULrIQNPWZ%2BfgmYv7aDivxDE%3D&reserved=0> on bug 6249<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915186856&sdata=9f8gBluCNiHFq7g5YA07fEzujqlasEE9Na2jD%2FS9Cog%3D&reserved=0> from Nate Rini<mailto:nate@schedmd.com> Riley (In reply to rjepperson from comment #0<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D6249%23c0&data=02%7C01%7Crjepperson%40ku.edu%7Cab70cdc53ef742099d2c08d6650429f8%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C636807465915196861&sdata=yWapTa3rTxvywPV8NiTc9yufUTi2ABJGKubtzS8l8Kg%3D&reserved=0>) > A user is submitting an interactive job requesting 4 gpus on a single node > with 2+ tasks: > srun --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 --pty /bin/bash -l Are they also calling salloc before calling srun? --Nate ________________________________ You are receiving this mail because: * You reported the bug. (In reply to rjepperson from comment #3) > No, they are not running salloc. I assumed salloc was just a sort of wrapper > for the SallocDefaultCommand which is srub. Salloc allows user to allocate resources for an interactive job and then use steps (srun) to run their programs. > How do I solve this though? I read other bugs, where developers suggest > using a "--gres:gpu=0" for the interactive job and then request > "--gres:gpu=4" for the actual "srun" step, but how does that work if the > user is just using "mpirun"? Here is how to stop a step from using gres: (from man srun) >By default, a job step is allocated all of the generic resources that have allocated to the job. To change the behavior so that each job step is allocated no generic resources, explicitly set the value of --gres to specify zero counts for each generic resource OR set "--gres=none" OR set the SLURM_STEP_GRES environment variable to "none". You could try using salloc: > salloc --partition=shontz --ntasks=2 --nodes=1 --gres=gpu:4 /bin/bash -l > srun --gres=gpu:4 /path/to/gpu/job & > env SLURM_STEP_GRES=none mpirun /path/to/non-gpu/job --Nate Great. Thank you. That will work. We'll just have to work on conveying the differences between srun and salloc to the users. You may close this ticket. Riley Closing ticket per your response. --Nate |