Ticket 7679

Summary: "--no-shell" with salloc gives segmentation fault
Product: Slurm Reporter: Ahmed Essam ElMazaty <ahmed.mazaty>
Component: User CommandsAssignee: Dominik Bartkiewicz <bart>
Status: RESOLVED FIXED QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: alex, bart, brian, par.lindfors, paran
Version: 19.05.2   
Hardware: Linux   
OS: Linux   
Site: KAUST Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 19.05.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Ahmed Essam ElMazaty 2019-09-03 05:49:14 MDT
Hello,
We upgraded Slurm to 19.05.2. I noticed that "--no-shell" option with salloc gives segmentation fault immediately. That wasn't the behavior before

$ salloc -n 8 -t 1:00:00 --no-shell
Segmentation fault (core dumped)

I can allocate resources without this option

$ salloc -n 8 -t 1:00:00 
salloc: Pending job allocation 6409418
salloc: job 6409418 queued and waiting for resources
salloc: job 6409418 has been allocated resources
salloc: Granted job allocation 6409418
salloc: Waiting for resource configuration
salloc: Nodes cn603-05-l are ready for job

Any explanation?

Best regards,
Ahmed
Comment 1 Dominik Bartkiewicz 2019-09-03 05:54:17 MDT
Hi

I can reproduce this.
I let you know when the fix will be in the repo.

Dominik
Comment 4 Pär Lindfors 2019-10-04 04:47:05 MDT
We have just upgraded to 19.05.2 and users have encountered this bug, we would also like to see this fixed. (Site SNIC-UPPMAX) 

A work-around to avoid the segmentation fault is to specify a job name.
"salloc --no-shell --job-name=test ..." works as it should.

Pär
Comment 5 Pär Lindfors 2019-10-04 05:11:17 MDT
I had some issues with the bugzilla login, so my previous comment might have been done from the wrong account. Sorry about any confusion.
Comment 8 Pär Lindfors 2019-10-07 04:09:46 MDT
In our user documentation we have documented the use of "salloc --no-shell" for one specific use case, so this is likely to affect more of our users than I initially suspected.

I have therefore raised the importance of the bug to Medium. Feel free to reassign the bug to our support contract if necessary.
Comment 13 Dominik Bartkiewicz 2019-10-08 03:32:52 MDT
Hi

Sorry for taking so long on this.
This issue has been fixed and will be available in 19.05.4:
https://github.com/SchedMD/slurm/commit/d31f97858ec

Dominik
Comment 14 Ahmed Essam ElMazaty 2019-10-08 23:44:37 MDT
(In reply to Pär Lindfors from comment #4)

> 
> A work-around to avoid the segmentation fault is to specify a job name.
> "salloc --no-shell --job-name=test ..." works as it should.
> 
> Pär

This worked!
Thanks Pär
Comment 17 Dominik Bartkiewicz 2019-10-09 08:51:42 MDT
Hi

I'll go ahead and close out the bug then. Please reopen if you have any
other issues related to this.

Additional commit (only for 20.02) adds default jobname of "no-shell" for salloc --no-shell.
https://github.com/SchedMD/slurm/commit/0aca6a7f038

Dominik
Comment 18 Pär Lindfors 2019-10-28 10:05:54 MDT
The fix works fine (cherry-picked on top of 19.05.3).
Thanks!