gets errors in srun job submission: [vlogin001 ~]$ srun -N 1 -n 4 -p devel -w e8002 --pty bash srun: error: e8002: task 1: Exited with exit code 1 e8002:/var/log/messages show: Dec 16 11:31:46 e8002 slurmd[44149]: launch task 314.0 request from UID:56207 GID:56207 HOST:172.21.100.1 PORT:33976 Dec 16 11:31:46 e8002 slurmd[44149]: lllp_distribution jobid [314] implicit auto binding: cores,one_thread, dist 8192 Dec 16 11:31:46 e8002 slurmd[44149]: _task_layout_lllp_cyclic Dec 16 11:31:46 e8002 slurmd[44149]: _lllp_generate_cpu_bind jobid [314]: mask_cpu,one_thread, 0x00000000000001,0x00000000000002,0x00000000000004,0x00000000000008 Dec 16 11:31:46 e8002 slurmd[44149]: _run_prolog: run job script took usec=159 Dec 16 11:31:46 e8002 slurmd[44149]: _run_prolog: prolog with lock for job 314 ran for 0 seconds Dec 16 11:31:46 e8002 slurmstepd[16033]: in _window_manager Dec 16 11:31:46 e8002 slurmstepd[16041]: task_p_pre_launch: Using sched_affinity for tasks Dec 16 11:31:46 e8002 slurmstepd[16042]: task_p_pre_launch: Using sched_affinity for tasks Dec 16 11:31:46 e8002 slurmstepd[16040]: task_p_pre_launch: Using sched_affinity for tasks Dec 16 11:31:46 e8002 slurmstepd[16039]: task_p_pre_launch: Using sched_affinity for tasks Dec 16 11:31:46 e8002 slurmstepd[16040]: error: Failed to invoke task plugins: task_p_pre_launch error Restarting slurmd makes the problem go away for a few minutes, then the problem reappear. We are using slurm-20.02.5-1.el7 on both job submission node and the compute node. Thanks!
Can you attach your slurm.conf? What Linux distro and kernel are you running on? Could you also set SlurmctldDebug=debug2, restart Slurm, reproduce the problem, and then attach the relevant portions of your slurmd.log and slurmctld.log? I can't be sure without more logs, but we recently fixed a similar cgroup-related error, so I would recommend upgrading to 20.02.6 to see if that solves the issue. Thanks -Michael
Thanks Michael, My teammate also submitted a case for this problem: 10460. You may merge this two cases. - Kevin Ying
Merging this with bug 10460 *** This ticket has been marked as a duplicate of ticket 10460 ***