Hi, after enabling the nss_plugin we are experiencing the following behavior: * srun whoami returns the correct username. * sbatch --wrap="whoami" returns nobody * sbatch --wrap="srun whoami" returns the correct username. Our nsswitch.conf looks like: ``` ... passwd: slurm files sss group: slurm files sss shadow: files sss ... ``` Additional details and logs: Submitting job from a local user (ec2-user), looks good ```shell [ec2-user@ip-192-168-39-241 ~]$ sbatch --wrap="id" Submitted batch job 1 [ec2-user@ip-192-168-39-241 ~]$ cat slurm-1.out uid=1000(ec2-user) gid=1000(ec2-user) groups=1000(ec2-user),4(adm),10(wheel),190(systemd-journal) ``` Submitting job with sbatch from a domain user (PclusterUser1), get nobody ```shell [PclusterUser1@ip-192-168-39-241 ~]$ sbatch --wrap="id" Submitted batch job 2 [PclusterUser1@ip-192-168-39-241 ~]$ cat slurm-2.out uid=1896801142(nobody) gid=1896800513(Domain Users) groups=1896800513(Domain Users) ``` Submitting job with sbatch+srun from a domain user (PclusterUser1), looks good ```shell [PclusterUser1@ip-192-168-39-241 ~]$ sbatch --wrap="srun id" Submitted batch job 3 [PclusterUser1@ip-192-168-39-241 ~]$ cat slurm-3.out uid=1896801142(PclusterUser1) gid=1896800513(Domain Users) groups=1896800513(Domain Users) ``` Submitting job with srun from a domain user (PclusterUser1), looks good ```shell [PclusterUser1@ip-192-168-39-241 ~]$ srun id uid=1896801142(PclusterUser1) gid=1896800513(Domain Users) groups=1896800513(Domain Users) ``` slurmd log on the compute node: ``` [2022-02-02T14:12:07.154] error: Node configuration differs from hardware: CPUs=4:4(hw) Boards=1:1(hw) SocketsPerBoard=4:1(hw) CoresPerSocket=1:2(hw) ThreadsPerCore=1:2(hw) [2022-02-02T14:12:07.160] CPU frequency setting not configured for this node [2022-02-02T14:12:07.165] slurmd version 21.08.5 started [2022-02-02T14:12:07.170] slurmd started on Wed, 02 Feb 2022 14:12:07 +0000 [2022-02-02T14:12:07.170] CPUs=4 Boards=1 Sockets=4 Cores=1 Threads=1 Memory=7623 TmpDisk=35827 Uptime=70 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) [2022-02-02T14:40:34.888] task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 1 [2022-02-02T14:40:34.888] task/affinity: batch_bind: job 1 CPU input mask for node: 0x1 [2022-02-02T14:40:34.888] task/affinity: batch_bind: job 1 CPU final HW mask for node: 0x1 [2022-02-02T14:40:34.888] Launching batch job 1 for UID 1000 [2022-02-02T14:40:34.979] [1.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status:0 [2022-02-02T14:40:34.981] [1.batch] done with job [2022-02-02T14:42:20.026] task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 2 [2022-02-02T14:42:20.026] task/affinity: batch_bind: job 2 CPU input mask for node: 0x1 [2022-02-02T14:42:20.026] task/affinity: batch_bind: job 2 CPU final HW mask for node: 0x1 [2022-02-02T14:42:20.027] Launching batch job 2 for UID 1896801142 [2022-02-02T14:42:20.065] [2.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status:0 [2022-02-02T14:42:20.067] [2.batch] done with job [2022-02-02T14:43:06.094] task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 3 [2022-02-02T14:43:06.094] task/affinity: batch_bind: job 3 CPU input mask for node: 0x1 [2022-02-02T14:43:06.094] task/affinity: batch_bind: job 3 CPU final HW mask for node: 0x1 [2022-02-02T14:43:06.094] Launching batch job 3 for UID 1896801142 [2022-02-02T14:43:06.735] launch task StepId=3.0 request from UID:1896801142 GID:1896800513 HOST:192.168.102.117 PORT:45550 [2022-02-02T14:43:06.736] task/affinity: lllp_distribution: JobId=3 implicit auto binding: sockets,one_thread, dist 1 [2022-02-02T14:43:06.736] task/affinity: _task_layout_lllp_cyclic: _task_layout_lllp_cyclic [2022-02-02T14:43:06.736] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [3]: mask_cpu,one_thread, 0x1 [2022-02-02T14:43:06.753] [3.0] done with job [2022-02-02T14:43:06.760] [3.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status:0 [2022-02-02T14:43:06.762] [3.batch] done with job [2022-02-02T14:55:00.947] launch task StepId=4.0 request from UID:1896801142 GID:1896800513 HOST:192.168.39.241 PORT:45796 [2022-02-02T14:55:00.947] task/affinity: lllp_distribution: JobId=4 implicit auto binding: sockets,one_thread, dist 8192 [2022-02-02T14:55:00.947] task/affinity: _task_layout_lllp_cyclic: _task_layout_lllp_cyclic [2022-02-02T14:55:00.947] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [4]: mask_cpu,one_thread, 0x1 [2022-02-02T14:55:00.987] [4.0] done with job ```
Hi Francesco, We've landed a patch that will be available starting in 21.08.6 that fixes this issue. (https://github.com/SchedMD/slurm/commit/d567b0c). Please let us know if you have any other issues! I'll resolve this ticket for now, but if you find that the problem persists please let us know! Thanks! --Tim