Ticket 2148

Summary: Suspend/Resume/Release/Hold
Product: Slurm Reporter: Paul Edmon <pedmon>
Component: OtherAssignee: Unassigned Developer <dev-unassigned>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 5 - Enhancement    
Priority: ---    
Version: 15.08.4   
Hardware: Linux   
OS: Linux   
Site: Harvard University Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Paul Edmon 2015-11-17 02:03:35 MST
First of many feature requests.  We've been compiling a running list for months and I've had some time to sit down and enter things.  So buckle in.

We would like a way to suspend/resume/release/hold multiple jobs at once or even all the jobs in a partition or for a specific user in one shot, or perhaps all the jobs in the cluster.  We've written some hacky scripts to do this by combining squeue and scontrol but it would be nice to have some utility like scancel or even give scancel the ability to issue suspend/resume/release/hold.  This would make our lives easier from a management end.

Of course if there is a way to do this that we missed we would be all ears, but as far as I know this functionality does not exist.

Thanks.

-Paul Edmon-
Comment 1 Moe Jette 2016-10-26 09:53:41 MDT
(In reply to Paul Edmon from comment #0)
> We would like a way to suspend/resume/release/hold multiple jobs at once or
> even all the jobs in a partition or for a specific user in one shot, or
> perhaps all the jobs in the cluster.

Those commands all accept space delimited lists of job IDs:
scontrol hold 123 456 789

There isn't any filtering by user, partition, etc. in the scontrol command.
What I do in those cases is use the squeue command to do the filtering and build a script, then execute that script. For example:

> squeue -u adam -h -o "scontrol hold %i"
scontrol hold 72
scontrol hold 74
scontrol hold 73
scontrol hold 75

I have added a FAQ about this:
https://github.com/SchedMD/slurm/commit/7b56d4264eb04033d511b961ec79ffbc87cdb3d9

I'm going to close this bug based upon the above work-around.