| Summary: | unable to cancel array job | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jonathon Anderson <jonathon.anderson> |
| Component: | slurmctld | Assignee: | Director of Support <support> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 17.11.1 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | University of Colorado | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
Hi Jonathon, This bug has been fixed in 17.11.4. For an explanation of the problem and its solution, you can look at bug 4833. I am going to close this bug as a duplicate, but should the fix in 17.11.4 not solve your problem, please comment here/reopen this ticket. Regards, Isaac *** This ticket has been marked as a duplicate of ticket 4833 *** |
This user would like to cancel his array job 796729. [root@slurm5 ~]# squeue -u luev6784 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 915122 smem mdd_ldms luev6784 R 1-00:54:01 1 smem0401 915121 smem mdd_ldms luev6784 R 1-00:54:13 1 smem0201 915117 smem gad luev6784 R 1-01:03:46 1 smem0101 915116 smem gad luev6784 R 1-01:04:46 1 smem0501 796729_2 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_4 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_5 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_6 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_7 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) 796729_12 smem LD luev6784 RH 0:00 1 (JobHoldMaxRequeue) But scancel as both the user and an admin fails. [root@slurm5 ~]# scancel 796729 [2018-05-14T11:15:47.377] _slurm_rpc_kill_job: REQUEST_KILL_JOB job 796729 uid 0 [2018-05-14T11:15:47.377] job_str_signal(3): invalid job id 796729 [2018-05-14T11:15:47.377] _slurm_rpc_kill_job: job_str_signal() job 796729 sig 9 returned Invalid job id specified [root@slurm5 ~]# scancel 796729_2 [2018-05-14T11:15:50.516] _slurm_rpc_kill_job: REQUEST_KILL_JOB job 796729_2 uid 0 [2018-05-14T11:15:50.517] job_str_signal(5): invalid job id 796729_2 [2018-05-14T11:15:50.517] _slurm_rpc_kill_job: job_str_signal() job 796729_2 sig 9 returned Invalid job id specified Please advise.