Ticket 15534 - Understanding Group cache
Summary: Understanding Group cache
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-12-01 12:11 MST by NASA JSC Aerolab
Modified: 2022-12-01 13:08 MST (History)
0 users

See Also:
Site: Johnson Space Center
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description NASA JSC Aerolab 2022-12-01 12:11:06 MST
Hello,
We were looking into our logs to troubleshoot "Kill Task failed" issue, and noticed the following line. 

[2022-12-01T09:51:05.724] error: _get_group_members: Could not find configured group europa_ms

Currently we don't have a group called europa_ms on our LDAP server. What does the above error mean? What do you suggest to resolve it?

Following is a snapshot of slurmctld log file.

Slurmctld log:

[2022-12-01T08:46:15.129] _slurm_rpc_submit_batch_job: JobId=778928 InitPrio=1575 usec=2764
[2022-12-01T08:46:15.597] sched: Allocate JobId=778928 NodeList=r2i3n[10-32] #CPUs=1104 Partition=normal
<cut>
[2022-12-01T09:49:46.779] _slurm_rpc_kill_job: REQUEST_KILL_JOB JobId=778928 uid 33153
[2022-12-01T09:50:11.434] lua: Valid workdir: /nobackup/pjang/orion/docking_ring/11s
[2022-12-01T09:50:11.434] lua: Found 1 valid cpu features
[2022-12-01T09:50:11.434] lua: Adding hipri partition
[2022-12-01T09:50:11.437] _slurm_rpc_submit_batch_job: JobId=778940 InitPrio=1575 usec=2651
[2022-12-01T09:50:12.166] sched: Allocate JobId=778940 NodeList=r2i0n[32-35],r2i1n[0-18] #CPUs=1104 Partition=norm
al
[2022-12-01T09:51:05.724] error: _get_group_members: Could not find configured group europa_ms
[2022-12-01T09:55:27.893] Resending TERMINATE_JOB request JobId=778928 Nodelist=r2i3n10
[2022-12-01T09:55:49.011] update_node: node r2i3n10 reason set to: Kill task failed
[2022-12-01T09:55:49.011] update_node: node r2i3n10 state set to DRAINING
[2022-12-01T09:55:49.011] error: slurm_msg_sendto: address:port=10.150.1.167:34638 msg_type=8001: No error
[2022-12-01T09:55:57.405] cleanup_completing: JobId=778928 completion process took 371 seconds
[2022-12-01T09:56:05.944] error: _get_group_members: Could not find configured group europa_ms


Thank you.
Patrick
Comment 1 NASA JSC Aerolab 2022-12-01 12:17:36 MST
Please ignore the issue. I noticed the group europa_ms was defined in the slurm.conf, but not in LDAP.

Thank you.

Patrick.
Comment 2 Jason Booth 2022-12-01 13:08:33 MST
Closing out as invalid.