Ticket 724

Summary: scontrol update reason in prolog gets whacked by slurm
Product: Slurm Reporter: Stuart Midgley <stuartm>
Component: slurmdAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: da
Version: 14.03.0   
Hardware: Linux   
OS: Linux   
Site: DownUnder GeoSolutions Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 14.03.1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Stuart Midgley 2014-04-16 14:21:23 MDT
Our prolog looks like

#!/bin/bash

# we check the monitoring system to see if we should run

if ! cat /monitoring/var/state/temperature /monitoring/var/state/system | awk -F$'\t' '$3>1 {exit 100}' ; then
    export PATH=/d/sw/slurm/latest/sbin:/d/sw/slurm/latest/bin:$PATH
    scontrol update nodename=$HOSTNAME state=DRAIN reason="Monitoring error: $(cat /monitoring/var/state/temperature /monitoring/var/state/system | awk -F$'\t' '$3>1 {print; exit}')"
    exit 100
fi




so, it is basically checking with our monitoring system that the node is OK.  If not, it marks the node as draining and updates the reason.

When the prolog finishes slurm then updates the reason as well, overwriting our reason... anyway to get our reason to take precedence?