Ticket 5764 - nodes drain when SLURM_MPI_TYPE is exported
Summary: nodes drain when SLURM_MPI_TYPE is exported
Status: RESOLVED WONTFIX
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-09-21 08:37 MDT by krd103
Modified: 2018-09-24 01:56 MDT (History)
0 users

See Also:
Site: OCF
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: Southampton University
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description krd103 2018-09-21 08:37:59 MDT
Whilst trying to debug an mpi issue I tried to run a job with the arguments

sbatch --export=SLURM_MPI_TYPE=ibmmpi my_job.sh

rather than failing due to an incorrect mpi type, this caused the node to drain with the reason "batch job complete failure".  This error seems to be reproducible for anything not listed by srun --mpi=list
Comment 2 Jason Booth 2018-09-21 14:03:26 MDT
Greetings, 

 I was not able to find you on our list of supported contacts. Unfortunately, these requests need to be filed through David Baker or Chris Hardacre at OCF.

-Jason
Comment 3 krd103 2018-09-24 01:56:18 MDT
Hi David,

Are you able to re-report the following bug?

Regards,

Keith

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: 21 September 2018 21:03
To: Daly K.R. <krd103@soton.ac.uk>
Subject: [Bug 5764] nodes drain when SLURM_MPI_TYPE is exported

Jason Booth<mailto:jbooth@schedmd.com> changed bug 5764<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5764&data=01%7C01%7Ckrd103%40soton.ac.uk%7C558c736e194e449f97db08d61ffd4c3a%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=1TQYe%2FtgQv0vYe28ITYLVtum2H54f6%2FPsXrQbrhbfLw%3D&reserved=0>
What

Removed

Added

Status

UNCONFIRMED

RESOLVED

Resolution

---

WONTFIX

Comment # 2<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5764%23c2&data=01%7C01%7Ckrd103%40soton.ac.uk%7C558c736e194e449f97db08d61ffd4c3a%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=NYUy9lxJbhkSOS86ghlZUslajhHy2ezxXzML1xdXi4A%3D&reserved=0> on bug 5764<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.schedmd.com%2Fshow_bug.cgi%3Fid%3D5764&data=01%7C01%7Ckrd103%40soton.ac.uk%7C558c736e194e449f97db08d61ffd4c3a%7C4a5378f929f44d3ebe89669d03ada9d8%7C1&sdata=1TQYe%2FtgQv0vYe28ITYLVtum2H54f6%2FPsXrQbrhbfLw%3D&reserved=0> from Jason Booth<mailto:jbooth@schedmd.com>

Greetings,



 I was not able to find you on our list of supported contacts. Unfortunately,

these requests need to be filed through David Baker or Chris Hardacre at OCF.



-Jason

________________________________
You are receiving this mail because:

  *   You reported the bug.