Ticket 4461

Summary: slurmd: default spool dir used in _read_config()
Product: Slurm Reporter: Jeff Frey <frey>
Component: slurmdAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: alex, bart, tim
Version: 17.11.0   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=4487
Site: Yale Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Jeff Frey 2017-12-01 09:47:12 MST
In (src/slurmd/slurmd/slurmd.c:832) the _update_logging() function is called from inside _read_config().

In 17.02 releases, the _update_logging() function handled the file reopening itself (with required conf fields filled-in by _read_config() before _update_logging() was called) and then returned.

In 17.11, the _update_logging() function IN ADDITION attempts to contact any step daemons in existence.  If does this using the value of conf->spooldir.  But _read_config() has not yet filled-in conf->spooldir when _update_logging() is called (it does so later at line 937).  So _update_logging() always tries the default spool directory (e.g. /var/spool/slurmd) producing the following red-herring error message in the slurmd.log:

    debug:  Log file re-opened
    error: Domain socket directory /var/spool/slurmd: No such file or directory
    debug2: hwloc_topology_init
    debug2: hwloc_topology_load
    debug:  CPUs:8 Boards:1 Sockets:2 CoresPerSocket:4 ThreadsPerCore:1
    debug4: CPU map[0]=>0 S:C:T 0:0:0
    debug4: CPU map[1]=>1 S:C:T 0:1:0
      :

This does not appear to adversely impact startup of slurmd, it just produces an unnecessary error message.  Solution:  the _read_config() function should be rearranged so that all conf field pre-conditions for _update_logging() are satisfied before it is called.