Ticket 4027

Summary: KNL related log spam on non-knl nodes
Product: Slurm Reporter: john.blaas
Component: slurmdAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: john.blaas
Version: 16.05.10   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=4487
Site: University of Colorado Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name: Summit
CLE Version: Version Fixed: 17.02.4 17.11.0-pre1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description john.blaas 2017-07-25 11:58:15 MDT
On our cluster we have a heterogenous mixture of haswell and KNL nodes.  We have noticed though that on our non-KNL nodes we are getting quite a bit of spam.

# grep /usr/bin/syscfg /var/log/messages
Jul 17 04:08:11 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 04:08:11 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 04:41:33 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 04:41:33 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 05:14:53 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 05:14:53 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 05:48:14 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 05:48:14 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 06:21:40 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 06:21:40 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 06:55:02 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 06:55:02 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 07:28:24 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 07:28:24 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 08:01:49 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 08:01:49 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 08:35:09 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 08:35:09 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 09:08:32 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 09:08:32 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 09:41:52 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 09:41:52 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 10:15:12 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 10:15:12 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 10:48:37 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 10:48:37 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 11:21:57 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 11:21:57 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 11:55:17 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 11:55:17 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 12:28:47 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 12:28:47 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 13:02:26 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 13:02:26 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 13:35:51 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 13:35:51 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 14:09:13 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 14:09:13 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 14:42:34 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 14:42:34 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 15:16:02 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory
Jul 17 15:16:02 shas0101 slurmd[34065]: error: _run_script: /usr/bin/syscfg can not be executed: No such file or directory

Thing is these nodes aren't setup with a feature of KNL, and even on the nodes that do have the feature of KNL and have a knl_generic.conf file setup with the following:

# cat knl_generic.conf 
# Managed by Puppet
SyscfgPath=/opt/dell/toolkit/bin/syscfg
DefaultNUMA=hemi         # NUMA=all2all
AllowNUMA=a2a,snc2,hemi
DefaultMCDRAM=cache     # MCDRAM=cache

So it is unclear how slurmd is even pulling up a path of /usr/bin/syscfg.

Any advice on how to rid us of this log spam would be greatly appreciated.
Comment 1 Tim Wickberg 2017-07-25 15:29:33 MDT
It's safe to ignore this, although it does admittedly generate a lot of noise.

This was fixed in 17.02.4 / 17.11.0-pre1 by commit ea2a0d25d11. If you're able to upgrade to the 17.02 branch at some point there are a lot of other assorted small fixes to the knl_generic plugin that you'll probably want as well.

- Tim
Comment 2 Tim Wickberg 2017-07-25 15:30:00 MDT
*** Ticket 3825 has been marked as a duplicate of this ticket. ***