| Summary: | slurmctld segmentation fault when using PartitionName "Nodes=" with brackets and FQDN | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Timothy Mullican <timothy.j.mullican> |
| Component: | slurmctld | Assignee: | Jacob Jenson <jacob> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 6 - No support contract | ||
| Priority: | --- | ||
| Version: | 20.11.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | RHEL | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | GDB output | ||
Created attachment 18712 [details] GDB output slurmctld seems to segfault whenever the PartitionName Nodes= value contains an entry with brackets and any additional special characters (such as period (.) for an FQDN). See the examples listed below. Contents of /etc/slurm/slurm.conf -- # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # SlurmctldHost=[redacted](172.16.1.10) # #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# #ProctrackType=proctrack/cgroup ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskPlugin=task/affinity # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_tres SelectTypeParameters=CR_Core # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=slurm-test #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # # COMPUTE NODES NodeName=worker1.test.local NodeAddr=172.16.1.11 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128872 State=UNKNOWN NodeName=worker2.test.local NodeAddr=172.16.1.12 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=128872 State=UNKNOWN [PARTITION_COMMAND_BELOW_HERE] -- Any of the following seem to cause a segmentation fault in slurmctld: PartitionName=debug Nodes=worker[1-2].test.local Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]. Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]? Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]! Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]@ Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]$ Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]% Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]^ Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]& Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]* Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]( Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]) Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]- Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]_ Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]+ Default=YES MaxTime=INFINITE State=Up PartitionName=debug Nodes=worker[1-2]= Default=YES MaxTime=INFINITE State=Up