Ticket 15534

Summary: Understanding Group cache
Product: Slurm Reporter: NASA JSC Aerolab <JSC-DL-AEROLAB-ADMIN>
Component: ConfigurationAssignee: Director of Support <support>
Status: RESOLVED INVALID QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 21.08.5   
Hardware: Linux   
OS: Linux   
Site: Johnson Space Center Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description NASA JSC Aerolab 2022-12-01 12:11:06 MST
Hello,
We were looking into our logs to troubleshoot "Kill Task failed" issue, and noticed the following line. 

[2022-12-01T09:51:05.724] error: _get_group_members: Could not find configured group europa_ms

Currently we don't have a group called europa_ms on our LDAP server. What does the above error mean? What do you suggest to resolve it?

Following is a snapshot of slurmctld log file.

Slurmctld log:

[2022-12-01T08:46:15.129] _slurm_rpc_submit_batch_job: JobId=778928 InitPrio=1575 usec=2764
[2022-12-01T08:46:15.597] sched: Allocate JobId=778928 NodeList=r2i3n[10-32] #CPUs=1104 Partition=normal
<cut>
[2022-12-01T09:49:46.779] _slurm_rpc_kill_job: REQUEST_KILL_JOB JobId=778928 uid 33153
[2022-12-01T09:50:11.434] lua: Valid workdir: /nobackup/pjang/orion/docking_ring/11s
[2022-12-01T09:50:11.434] lua: Found 1 valid cpu features
[2022-12-01T09:50:11.434] lua: Adding hipri partition
[2022-12-01T09:50:11.437] _slurm_rpc_submit_batch_job: JobId=778940 InitPrio=1575 usec=2651
[2022-12-01T09:50:12.166] sched: Allocate JobId=778940 NodeList=r2i0n[32-35],r2i1n[0-18] #CPUs=1104 Partition=norm
al
[2022-12-01T09:51:05.724] error: _get_group_members: Could not find configured group europa_ms
[2022-12-01T09:55:27.893] Resending TERMINATE_JOB request JobId=778928 Nodelist=r2i3n10
[2022-12-01T09:55:49.011] update_node: node r2i3n10 reason set to: Kill task failed
[2022-12-01T09:55:49.011] update_node: node r2i3n10 state set to DRAINING
[2022-12-01T09:55:49.011] error: slurm_msg_sendto: address:port=10.150.1.167:34638 msg_type=8001: No error
[2022-12-01T09:55:57.405] cleanup_completing: JobId=778928 completion process took 371 seconds
[2022-12-01T09:56:05.944] error: _get_group_members: Could not find configured group europa_ms


Thank you.
Patrick
Comment 1 NASA JSC Aerolab 2022-12-01 12:17:36 MST
Please ignore the issue. I noticed the group europa_ms was defined in the slurm.conf, but not in LDAP.

Thank you.

Patrick.
Comment 2 Jason Booth 2022-12-01 13:08:33 MST
Closing out as invalid.