Ticket 11237

Summary: slurmctld segmentation fault when using PartitionName "Nodes=" with brackets and FQDN
Product: Slurm Reporter: Timothy Mullican <timothy.j.mullican>
Component: slurmctldAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 20.11.2   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: RHEL
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: GDB output

Description Timothy Mullican 2021-03-27 23:38:24 MDT
Created attachment 18712 [details]
GDB output

slurmctld seems to segfault whenever the PartitionName Nodes= value contains an entry with brackets and any additional special characters (such as period (.) for an FQDN). See the examples listed below.

Contents of /etc/slurm/slurm.conf
--
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=[redacted](172.16.1.10)
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
#ProctrackType=proctrack/cgroup
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=slurm-test
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#
#
# COMPUTE NODES
NodeName=worker1.test.local NodeAddr=172.16.1.11 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128872 State=UNKNOWN
NodeName=worker2.test.local NodeAddr=172.16.1.12 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128872 State=UNKNOWN

[PARTITION_COMMAND_BELOW_HERE]
--

Any of the following seem to cause a segmentation fault in slurmctld:
PartitionName=debug Nodes=worker[1-2].test.local Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]. Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]? Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]! Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]@ Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]$ Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]% Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]^ Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]& Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]* Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]( Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]) Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]- Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]_ Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]+ Default=YES MaxTime=INFINITE State=Up
PartitionName=debug Nodes=worker[1-2]= Default=YES MaxTime=INFINITE State=Up