Ticket 4461 - slurmd: default spool dir used in _read_config()
Summary: slurmd: default spool dir used in _read_config()
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 17.11.0
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-12-01 09:47 MST by Jeff Frey
Modified: 2017-12-12 05:47 MST (History)
3 users (show)

See Also:
Site: Yale
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Jeff Frey 2017-12-01 09:47:12 MST
In (src/slurmd/slurmd/slurmd.c:832) the _update_logging() function is called from inside _read_config().

In 17.02 releases, the _update_logging() function handled the file reopening itself (with required conf fields filled-in by _read_config() before _update_logging() was called) and then returned.

In 17.11, the _update_logging() function IN ADDITION attempts to contact any step daemons in existence.  If does this using the value of conf->spooldir.  But _read_config() has not yet filled-in conf->spooldir when _update_logging() is called (it does so later at line 937).  So _update_logging() always tries the default spool directory (e.g. /var/spool/slurmd) producing the following red-herring error message in the slurmd.log:

    debug:  Log file re-opened
    error: Domain socket directory /var/spool/slurmd: No such file or directory
    debug2: hwloc_topology_init
    debug2: hwloc_topology_load
    debug:  CPUs:8 Boards:1 Sockets:2 CoresPerSocket:4 ThreadsPerCore:1
    debug4: CPU map[0]=>0 S:C:T 0:0:0
    debug4: CPU map[1]=>1 S:C:T 0:1:0
      :

This does not appear to adversely impact startup of slurmd, it just produces an unnecessary error message.  Solution:  the _read_config() function should be rearranged so that all conf field pre-conditions for _update_logging() are satisfied before it is called.