Ticket 9528

Summary: mpiexec does not propagate exit code
Product: Slurm Reporter: Simon Byrne <simonbyrne>
Component: User CommandsAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: C - Contributions    
Priority: --- CC: jess, rberger
Version: 21.08.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 23.11.0rc1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: Patch to correctly propagate exit code

Description Simon Byrne 2020-08-06 11:20:05 MDT
$ mpiexec bash -c "exit 1" 

$ echo $? 
0

I'm not that familiar with perl, but it looks like the return value for the system command here:
https://github.com/SchedMD/slurm/blob/b1656169f48a73c1229560d8d19a902e1956aac7/contribs/torque/mpiexec.pl#L165
should be captured (https://perldoc.perl.org/functions/system.html) so something like

my $exit_code = system($command);

system("rm -f $new_config") if($new_config);

exit($exit_code >> 8);
Comment 3 Simon Byrne 2020-08-07 12:14:32 MDT
Created attachment 15353 [details]
Patch to correctly propagate exit code
Comment 4 Richard Berger 2023-06-26 17:41:17 MDT
Are there any plans to fix this issue? We've just replicated that workaround for us locally. But it would be nice if it gets into a future release.
Comment 5 Tim Wickberg 2023-06-27 10:51:43 MDT
Simon -

Thanks for the submission. It's finally upstream, and will be included in the 23.11 release later this fall. Commit details follow for reference.

- Tim

commit 5ab128ca079102305be383a1ae23c7a87543150b
Author:     Simon Byrne <simonbyrne@gmail.com>
AuthorDate: Tue Jun 27 10:43:52 2023 -0600

    torque/mpiexec - Propagate exit code from launched process.
    
    Bug 9528.