592 – Native SLURM: Suspend/Resume with Gang Scheduling setup at slurm.conf - slurm_suspend error: Requested operation is presently disabled

Ticket 592 - Native SLURM: Suspend/Resume with Gang Scheduling setup at slurm.conf - slurm_suspend error: Requested operation is presently disabled

Summary: Native SLURM: Suspend/Resume with Gang Scheduling setup at slurm.conf - slurm...

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Other (show other tickets)
Version:	14.03.x
Hardware:	Linux Linux

Severity:	3 - Medium Impact
Assignee:	David Bigagli
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2014-02-12 09:33 MST by tchoi
Modified:	2014-02-12 10:45 MST (History)
CC List:	1 user (show)

See Also:
Site:	CRAY
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description tchoi 2014-02-12 09:33:14 MST

Native SLURM: Suspend/Resume - slurm_suspend error: Requested operation is presently disabled when we set SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file.

System: snake-p3
Native SLURM:14.03.0

# SLURM version: 14.03.0
tchoi@snake-p3:/home/users/tchoi> srun --version
slurm 14.03.0
                                                 
tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i Slice  
SchedulerTimeSlice=30

tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i gang 
PreemptMode=GANG

tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i suspend  

# SLURM version: 14.03.0
tchoi@snake-p3:/home/users/tchoi> srun --version
slurm 14.03.0

# Launch first application:
tchoi@snake-p3:/home/users/tchoi> srun --nodes=1 --ntasks=2 --cpus-per-task=3 -w nid000[24-25] cpu_mp 12000

# Verify it is running on nid000[24-25]:
> tchoi@snake-p3:/home/users/tchoi> squeue -l                                                                               > Tue Feb 11 18:36:02 2014          
>              JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES   NODELIST(REASON)
>               235     workq         cpu_mp    tchoi  RUNNING       7:55   1:00:00      2           nid000[24-25]

# Second application:
tchoi@snake-p3:/home/users/tchoi> srun --nodes=1 --ntasks=2 --cpus-per-task=4 -w nid000[24-25] /cray/css/ostest/binaries/xt/dev.aries.cray-sp3/xtcnl/ostest/ROOT.latest/tests/alps/ring
srun: job 237 queued and waiting for resources

# Verify first app (JobID 235) is running and second app (JobID 237) is pending:
> tchoi@snake-p3:/home/users/tchoi> squeue -l                                                                                                                 
> Tue Feb 11 18:37:32 2014
>             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
>               235     workq            cpu_mp    tchoi  RUNNING       9:25    1:00:00      2 nid000[24-25]
>               237     workq             ring           tchoi  PENDING       0:00   1:00:00      2 (Resources)


tchoi@snake-p3:/home/users/tchoi> su root

# Suspend first app:
snake-p3:/home/users/tchoi # scontrol suspend 235

# Both applications are suspending…
# I don’t know this is a bug to SLURM or an expected behavior.
# Question 1 – Please tell me both applications are suspending is an expected behavior or a bug.
# This occurs only first time suspending:

> tchoi@snake-p3:/home/users/tchoi> squeue -l  
> Tue Feb 11 18:38:02 2014
>             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
>               235     workq   cpu_mp    tchoi    SUSPENDE       9:48   1:00:00      2 nid000[24-25]
>               237     workq     ring          tchoi    SUSPENDE       0:00   1:00:00      2 nid000[24-25]

# A few seconds later – second application is running:
> Tue Feb 11 18:38:12 2014
>             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
>               235     workq   cpu_mp    tchoi SUSPENDE       9:48   1:00:00      2 nid000[24-25]
>               237     workq     ring    tchoi  RUNNING       0:26   1:00:00      2 nid000[24-25]

# Suspend second application: Now I got the following slurm_suspend_error: Requested operation is presently disabled.

# Question 2: Is it due to setting of SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file:

snake-p3:/home/users/tchoi # scontrol suspend 237
Requested operation is presently disabled for array job_id 237
slurm_suspend error: Requested operation is presently disabled

# Every 30 seconds - two applications oversubscribed without doing scontrol suspend/resume [JobID]:

tchoi@snake-p3:/etc/opt/slurm> squeue -l -i30                
Wed Feb 12 17:06:41 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               261     workq     ring    tchoi SUSPENDE       4:50   1:00:00      2 nid000[24-25]
               260     workq   cpu_mp    tchoi  RUNNING       6:30   1:00:00      2 nid000[24-25]

Wed Feb 12 17:07:11 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               260     workq   cpu_mp    tchoi SUSPENDE       6:36   1:00:00      2 nid000[24-25]
               261     workq     ring    tchoi  RUNNING       5:13   1:00:00      2 nid000[24-25]

Wed Feb 12 17:07:41 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               261     workq     ring    tchoi SUSPENDE       5:19   1:00:00      2 nid000[24-25]
               260     workq   cpu_mp    tchoi  RUNNING       7:00   1:00:00      2 nid000[24-25]

Wed Feb 12 17:08:11 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               260     workq   cpu_mp    tchoi SUSPENDE       7:05   1:00:00      2 nid000[24-25]
               261     workq     ring    tchoi  RUNNING       5:44   1:00:00      2 nid000[24-25]

Wed Feb 12 17:08:41 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               261     workq     ring    tchoi SUSPENDE       5:48   1:00:00      2 nid000[24-25]
               260     workq   cpu_mp    tchoi  RUNNING       7:31   1:00:00      2 nid000[24-25]

Wed Feb 12 17:09:11 2014
             JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT  NODES NODELIST(REASON)
               260     workq   cpu_mp    tchoi SUSPENDE       7:34   1:00:00      2 nid000[24-25]
               261     workq     ring    tchoi  RUNNING       6:15   1:00:00      2 nid000[24-25]

Comment 1 tchoi 2014-02-12 09:36:48 MST

Is it due to set SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file at slurm.conf?

If yes, please let us know a correct setting at slurm.conf to Suspend/Resume jobs.

Comment 2 Moe Jette 2014-02-12 09:54:11 MST

Changing severity per descriptions in contract as shown below.

Severity Levels
1. Severity 1   Major Impact
A   Severity   1   issue   occurs   when   there   is   a   continued   system   outage   that   affects   a   large   set   of
end   users.   The   system   is   down   and   nonfunctional   due   to   Slurm   problem(s)   and   no
procedural workaround exists.
2. Severity 2 – High Impact
A   Severity   2   issue   is   a   high impact   problem   that   is   causing   sporadic   outages   or   is
consistently   encountered   by   end   users   with   adverse   impact   to   end   user   interaction   with   the
system.
3. Severity 3  Medium Impact
A   Severity   3   issue   is   a   medium t olow   impact   problem   that   includes   partial   noncritical   loss
of   system   access   or   which   impairs   some   operations   on   the   system   but   allows   the   end   user   to
continue to function on the system with workarounds.
4. Severity 4 – Minor Issues
A   Severity   4   issue   is   a   minor   issue   with   limited   or   no   loss   in   functionality   within   the   customer
environment.   Severity   4   issues   may   also   be   used   for   recommendations   for   future   product
enhancements or modifications.

Comment 3 Moe Jette 2014-02-12 10:00:05 MST

It sounds like your system is configured to oversubscript resources and time slice jobs. When you suspend one job, that does not trigger an immediate resumption of the other job, but waits for the 30 second time slice to be reached. Suggest that you review gang scheduling documentation here:

http://slurm.schedmd.com/gang_scheduling.html

Regarding suspending a job array, I would guess that some portions of it are in a state which can not be suspended, say pending.

Comment 4 Moe Jette 2014-02-12 10:45:25 MST

The "array job_id" portion of the error message was bad. It has been correct to just be "job_id". Everything else appears to be working as expected for a system with gang scheduling enabled.