Ticket 15172

Summary: NULL node name in slurmctld log
Product: Slurm Reporter: DavidM <david.magda>
Component: slurmctldAssignee: Oscar Hernández <oscar.hernandez>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: oscar.hernandez
Version: 22.05.3   
Hardware: Linux   
OS: Linux   
Site: Vector Institute Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description DavidM 2022-10-14 07:07:03 MDT
Hello,

After upgrading slurmctld from 20.11.09 to 22.05.3 we're noticing the following in the slurmctld.log file whenever a job is being submitted:

"""
[2022-10-14T08:55:44.391] _find_node_record: passed NULL node name
[2022-10-14T08:55:44.392] error: prolog_complete: can't find node:(null)
"""

It was not present in the logs when running 20.11 (or 19.05 before that).

The slurmDs on the compute nodes are still running 20.11.9, as are CLI utilities, if that matters.
Comment 2 Oscar Hernández 2022-10-18 06:09:32 MDT
Hi David,

I can confirm these errors you are mentioning are due to the version mismatch. 22.05 expects some variable 20.11 was not providing. 

They are related to some changes that were added in prolog handling in 22.05. Both versions are still compatible though, as Slurm correctly handles this scenario.

These errors should not cause any harm other than showing up in the log. You should not worry about them. But let me know if you noticed some problem that think could be related.

I will take a look to see what can be done to silence the errors.

Kind regards,
Oscar
Comment 3 DavidM 2022-10-18 06:35:58 MDT
Thanks for the info. We're planning to fully upgrade to 22.05 shortly, so hopefully the messages will go away.