This user would like to cancel his array job 796729. [root@slurm5 ~]# squeue -u luev6784 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 915122 smem mdd_ldms luev6784 R 1-00:54:01 1 smem0401 915121 smem mdd_ldms luev6784 R 1-00:54:13 1 smem0201 915117 smem gad luev6784 R 1-01:03:46 1 smem0101 915116 smem gad luev6784 R 1-01:04:46 1 smem0501 796729_2 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_4 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_5 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_6 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_7 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_12 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) But scancel as both the user and an admin fails. [root@slurm5 ~]# scancel 796729 [2018-05-14T11:15:47.377] _slurm_rpc_kill_job: REQUEST_KILL_JOB job 796729 uid 0 [2018-05-14T11:15:47.377] job_str_signal(3): invalid job id 796729 [2018-05-14T11:15:47.377] _slurm_rpc_kill_job: job_str_signal() job 796729 sig 9 returned Invalid job id specified [root@slurm5 ~]# scancel 796729_2 [2018-05-14T11:15:50.516] _slurm_rpc_kill_job: REQUEST_KILL_JOB job 796729_2 uid 0 [2018-05-14T11:15:50.517] job_str_signal(5): invalid job id 796729_2 [2018-05-14T11:15:50.517] _slurm_rpc_kill_job: job_str_signal() job 796729_2 sig 9 returned Invalid job id specified Please advise.
Hi Jonathon, This bug has been fixed in 17.11.4. For an explanation of the problem and its solution, you can look at bug 4833. I am going to close this bug as a duplicate, but should the fix in 17.11.4 not solve your problem, please comment here/reopen this ticket. Regards, Isaac *** This ticket has been marked as a duplicate of ticket 4833 ***