Ticket 695 - Can not release jobs
Summary: Can not release jobs
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 14.03.0
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: David Bigagli
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-04-13 05:26 MDT by Stuart Midgley
Modified: 2014-04-13 10:14 MDT (History)
1 user (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
attachment-8337-0.html (2.54 KB, text/html)
2014-04-13 10:14 MDT, David Bigagli
Details

Note You need to log in before you can comment on or make changes to this ticket.
Description Stuart Midgley 2014-04-13 05:26:36 MDT
Created attachment 745 [details]
attachment-8337-0.html

20140414012355 bud30:~> squeue -t SE
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
        566710_211    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
        566711_216    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
        566712_233    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
        566714_266    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
        566715_270    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
        566716_277    teambm a_harFM_    yanaz SE       0:00      1 (JobHeldUser)
          567248_1    teambm a_600_Fi michaeld SE       0:00      1 (JobHeldUser)
          567313_2    teambm a_600_Fi michaeld SE       0:00      1 (JobHeldUser)
       622761_1000    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622772_1010    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622783_1020    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622794_1030    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622805_1040    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622816_1050    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622827_1060    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622838_1070    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622849_1080    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622860_1090    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622871_1100    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       622882_1110    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       623025_1240    teambm  dc_5_35   bjornm SE       0:00      1 (JobHeldUser)
       623080_1008    teambm  dp_5_35   bjornm SE       0:00      1 (JobHeldUser)
 623146_[1030-1039    teambm  dp_5_35   bjornm SE       0:00      1 (JobHeldUser)
 623168_[1040-1049    teambm  dp_5_35   bjornm SE       0:00      1 (JobHeldUser)
 623190_[1050-1059    teambm  dp_5_35   bjornm SE       0:00      1 (JobHeldUser)
        567249_107 teamswanM m_600_Fi michaeld SE       0:00      1 (JobHeldUser)
20140414012404 bud30:~> scontrol show jobid=567249_107
JobId=567291 ArrayJobId=567249 ArrayTaskId=107 Name=m_600_FinalMig_2_EvenIL_OddCL
   UserId=michaeld(1260) GroupId=teambm(2102)
   Priority=0 Nice=1014 Account=(null) QOS=normal
   JobState=SPECIAL_EXIT Reason=JobHeldUser Dependency=(null)
   Requeue=1 Restarts=1 BatchFlag=1 ExitCode=100:0
   RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A
   SubmitTime=2014-04-13T22:43:11 EligibleTime=2014-04-13T22:43:12
   StartTime=Unknown EndTime=2014-04-14T01:01:48
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=teamswanMig AllocNode:Sid=bud30:8489
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=clus274
   BatchHost=clus274
   NumNodes=1 NumCPUs=32 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=0
   MinCPUsNode=1 MinMemoryNode=15661M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/p3/cue/zeebriesPr_002/imaging/600_FinalMig/040migOut/jobs/rj.m_600_FinalMig_2_EvenIL_OddCL.UGutjW
   WorkDir=/p3/cue/zeebriesPr_002/imaging/600_FinalMig/040migOut/jobs
   Comment=sge job id 2057393 
   StdErr=/p3/cue/zeebriesPr_002/imaging/600_FinalMig/040migOut/jobs/logs/m_600_FinalMig_2_EvenIL_OddCL/michaeld/m_600_FinalMig_2_EvenIL_
   StdIn=/dev/null
   StdOut=/p3/cue/zeebriesPr_002/imaging/600_FinalMig/040migOut/jobs/logs/m_600_FinalMig_2_EvenIL_OddCL/michaeld/m_600_FinalMig_2_EvenIL_

20140414012445 bud30:~> sudo -s
[sudo] password for stuartm: 
140414012457 bud30:stuartm# export PATH=/d/sw/slurm/latest/sbin:/d/sw/slurm/latest/bin:$PATH
140414012510 bud30:stuartm# scontrol release job=567249_107
Invalid job id specified (job=567249_107)
slurm_suspend error: No error
140414012529 bud30:stuartm# scontrol release job=567291_107
Invalid job id specified (job=567291_107)
slurm_suspend error: No error
Comment 1 Stuart Midgley 2014-04-13 05:50:12 MDT
URGH... I'm an idiot... that's what occurs when your doing sysadmin stuff at 1am.

    scontrol release <jobid>_<taskid>

does the trick.
Comment 2 David Bigagli 2014-04-13 10:14:41 MDT
There is no stupid question. :-) The input format among commands is indeed not consistent.

On April 13, 2014 10:50:12 AM PDT, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=695
>
>Stuart Midgley <stuartm@dugeo.com> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             Status|UNCONFIRMED                 |RESOLVED
>         Resolution|---                         |INFOGIVEN
>
>--- Comment #1 from Stuart Midgley <stuartm@dugeo.com> ---
>URGH... I'm an idiot... that's what occurs when your doing sysadmin
>stuff at
>1am.
>
>    scontrol release <jobid>_<taskid>
>
>does the trick.
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
>You are the assignee for the bug.
>You are watching someone on the CC list of the bug.
>You are watching the assignee of the bug.