| Summary: | scontrol update reason in prolog gets whacked by slurm | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Stuart Midgley <stuartm> |
| Component: | slurmd | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | da |
| Version: | 14.03.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | DownUnder GeoSolutions | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 14.03.1 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
Our prolog looks like #!/bin/bash # we check the monitoring system to see if we should run if ! cat /monitoring/var/state/temperature /monitoring/var/state/system | awk -F$'\t' '$3>1 {exit 100}' ; then export PATH=/d/sw/slurm/latest/sbin:/d/sw/slurm/latest/bin:$PATH scontrol update nodename=$HOSTNAME state=DRAIN reason="Monitoring error: $(cat /monitoring/var/state/temperature /monitoring/var/state/system | awk -F$'\t' '$3>1 {print; exit}')" exit 100 fi so, it is basically checking with our monitoring system that the node is OK. If not, it marks the node as draining and updates the reason. When the prolog finishes slurm then updates the reason as well, overwriting our reason... anyway to get our reason to take precedence?