We were troubleshooting some job issues in slurm, slurm commands were all working, however the slurmctld.log was empty. Restarting resolved the issue, however we're not sure what caused the log file to empty?
Did you happen to take a `lsof -p $(pgrep slurmctld)` before restarting?
We did not, but will make sure we run that should it happen in the future. Anything else we should check?
(In reply to Scott Lucas from comment #2) > We did not, but will make sure we run that should it happen in the future. > Anything else we should check? Please take a coredump at the same time (and generate a backtrace). I will need to look at both to figure out what is going on.
So the next time this happens, run: `lsof -p $(pgrep slurmctld)` and scontrol abort (to generate a coredump) ?
(In reply to Scott Lucas from comment #4) > scontrol abort (to generate a coredump) That would kill the server and potentially cause data loss. Instead use gcore to grab the core without killing the daemon: > gcore $(pgrep slurmctld)
Will do
Scott I'm going to mark this issue as timed out. Once the logs are ready, please reply with them and we can continue debugging. Thanks, --Nate