Ticket 478 - test requeue functionality
Summary: test requeue functionality
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Test Suite (show other tickets)
Version: 14.03.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Nathan Yee
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2013-10-22 10:48 MDT by David Bigagli
Modified: 2014-02-24 08:52 MST (History)
0 users

See Also:
Site: SchedMD
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 14.03.0pre7
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Test for scontrol requeue and requeuehold (13.60 KB, patch)
2013-11-14 19:20 MST, Nathan Yee
Details | Diff
Test for scontrol requeue and requeuehold (13.62 KB, patch)
2013-11-14 19:28 MST, Nathan Yee
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description David Bigagli 2013-10-22 10:48:31 MDT
We have developed a new feature which introduces these behaviours:

1) A job can be requeued back to pend from completed or failed state.

After the job terminates and before its get cleaned from the slurmctld
memory it can be requeued using 'scontrol requeue job_id'.
The job goes back to the pending state and then runs again. This can be repeated an infinite number of times.

A test case should be develop to verify that indeed a job can be requeued 
from completed or failed state and that it runs again after.

2) A job can be requeued back to pend in hold state. A new scontrol keyword 
was introduced. 'scontrol requeuehold job_id'. The job goes back to pending
state in hold. The job can be unhold using the usual 'scontrol release job_id' 
upon which the job should start to run again. 
This can be repeated an infinite number of times.

A test case should be developed to verify that a job can be requeued in hold state and that it will run again upon being released.

2.1) A job can be requeued hold in a special JOB_SPECIAL_STATE.
'scontrol requeuehold State=SpecialExit job_id'. 
This job will be held in and its state shown as SPECIAL_EXIT using 
'scontrol show job'.  The job can be unhold using the usual 'scontrol release job_id' upon which the job should start to run again. 
This can be repeated an infinite number of times.

A test case should be developed to verify that a job can be requeued in hold state, that its state is indeed SPECIAL_EXIT and that it will run again upon being released.

Thanks,
       David
Comment 1 Nathan Yee 2013-11-12 15:50:34 MST
For testing the scontrol requeue option how would I hold the job. Would I start a job with a --begin in the future then do an scontrol hold on that job to get a priority of 0? I read in the Doc that I could put a job on hold by setting the --begin to a time in the future but when i do this how would I check that this job is in fact on hold? Also when I do set the --begin to a future time (assuming this does hold the job) the job state is still pending so when I do the scontrol requeuehold that state never changes. Maybe I am missing something, like is there a different way to put a job on hold where that state is something other then pending.
Comment 2 Nathan Yee 2013-11-12 16:09:01 MST
PLEASE DISREGARD MY EARLIER MESSAGE!!!!

New Question:

I just want to make sure that I am uderstanding #2 correctly. Once a job is completed I can use "scontrol requeuehold" and requeue the completed job back to a pending state and this job will also be on hold. Is this what this option is supposed to do?

One other things is what determineds a job to be in hold? Does it just mean that the priority of the job is set to 0?

Thanks in advance
Comment 3 David Bigagli 2013-11-13 03:33:57 MST
'scontrol requeuehold' requeue a job or a job array from *any* state to 
state PENDING and puts it on hold.
'scontrol release' will release the job from the holding to be ready to 
run again.
'scontrol requeue' requeue a job or job array from *any* state to 
PENDING ready to be run again.

Yes a job is put on hold by setting its priority to 0.

Let me know if this answers your questions.

On Tue 12 Nov 2013 11:09:01 PM PST, bugs@schedmd.com wrote:
> *Comment # 2 <http://bugs.schedmd.com/show_bug.cgi?id=478#c2> on bug
> 478 <http://bugs.schedmd.com/show_bug.cgi?id=478> from Nathan Yee
> <mailto:nyee32@schedmd.com> *
> PLEASE DISREGARD MY EARLIER MESSAGE!!!!
>
> New Question:
>
> I just want to make sure that I am uderstanding #2 correctly. Once a job is
> completed I can use "scontrol requeuehold" and requeue the completed job back
> to a pending state and this job will also be on hold. Is this what this option
> is supposed to do?
>
> One other things is what determineds a job to be in hold? Does it just mean
> that the priority of the job is set to 0?
>
> Thanks in advance
>
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You are on the CC list for the bug.
>   * You reported the bug.
>   * You are watching someone on the CC list of the bug.
>   * You are watching the reporter of the bug.
>

--

Thanks,
      /David/Bigagli

www.schedmd.com
Comment 4 Nathan Yee 2013-11-14 19:20:19 MST
Created attachment 508 [details]
Test for scontrol requeue and requeuehold

Let me know if it needs any changes.
Comment 5 Nathan Yee 2013-11-14 19:28:47 MST
Created attachment 509 [details]
Test for scontrol requeue and requeuehold

Made a few changes to the patch. Let me know if any changes need to be made.
Comment 6 Moe Jette 2014-02-24 08:52:44 MST
One note, when a job submit fails and job ID is zero, you can't just print an error message and continue as all of the references to job_id will be bad, just exit. Commit is here:

https://github.com/SchedMD/slurm/commit/7582ccdd203822815cc07948dcad488efd7c57e9