Native SLURM: Suspend/Resume - slurm_suspend error: Requested operation is presently disabled when we set SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file. System: snake-p3 Native SLURM:14.03.0 # SLURM version: 14.03.0 tchoi@snake-p3:/home/users/tchoi> srun --version slurm 14.03.0 tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i Slice SchedulerTimeSlice=30 tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i gang PreemptMode=GANG tchoi@snake-p3:/home/users/tchoi> more /etc/opt/slurm/slurm.conf | grep -i suspend # SLURM version: 14.03.0 tchoi@snake-p3:/home/users/tchoi> srun --version slurm 14.03.0 # Launch first application: tchoi@snake-p3:/home/users/tchoi> srun --nodes=1 --ntasks=2 --cpus-per-task=3 -w nid000[24-25] cpu_mp 12000 # Verify it is running on nid000[24-25]: > tchoi@snake-p3:/home/users/tchoi> squeue -l > Tue Feb 11 18:36:02 2014 > JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) > 235 workq cpu_mp tchoi RUNNING 7:55 1:00:00 2 nid000[24-25] # Second application: tchoi@snake-p3:/home/users/tchoi> srun --nodes=1 --ntasks=2 --cpus-per-task=4 -w nid000[24-25] /cray/css/ostest/binaries/xt/dev.aries.cray-sp3/xtcnl/ostest/ROOT.latest/tests/alps/ring srun: job 237 queued and waiting for resources # Verify first app (JobID 235) is running and second app (JobID 237) is pending: > tchoi@snake-p3:/home/users/tchoi> squeue -l > Tue Feb 11 18:37:32 2014 > JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) > 235 workq cpu_mp tchoi RUNNING 9:25 1:00:00 2 nid000[24-25] > 237 workq ring tchoi PENDING 0:00 1:00:00 2 (Resources) tchoi@snake-p3:/home/users/tchoi> su root # Suspend first app: snake-p3:/home/users/tchoi # scontrol suspend 235 # Both applications are suspending… # I don’t know this is a bug to SLURM or an expected behavior. # Question 1 – Please tell me both applications are suspending is an expected behavior or a bug. # This occurs only first time suspending: > tchoi@snake-p3:/home/users/tchoi> squeue -l > Tue Feb 11 18:38:02 2014 > JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) > 235 workq cpu_mp tchoi SUSPENDE 9:48 1:00:00 2 nid000[24-25] > 237 workq ring tchoi SUSPENDE 0:00 1:00:00 2 nid000[24-25] # A few seconds later – second application is running: > Tue Feb 11 18:38:12 2014 > JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) > 235 workq cpu_mp tchoi SUSPENDE 9:48 1:00:00 2 nid000[24-25] > 237 workq ring tchoi RUNNING 0:26 1:00:00 2 nid000[24-25] # Suspend second application: Now I got the following slurm_suspend_error: Requested operation is presently disabled. # Question 2: Is it due to setting of SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file: snake-p3:/home/users/tchoi # scontrol suspend 237 Requested operation is presently disabled for array job_id 237 slurm_suspend error: Requested operation is presently disabled # Every 30 seconds - two applications oversubscribed without doing scontrol suspend/resume [JobID]: tchoi@snake-p3:/etc/opt/slurm> squeue -l -i30 Wed Feb 12 17:06:41 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 261 workq ring tchoi SUSPENDE 4:50 1:00:00 2 nid000[24-25] 260 workq cpu_mp tchoi RUNNING 6:30 1:00:00 2 nid000[24-25] Wed Feb 12 17:07:11 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 260 workq cpu_mp tchoi SUSPENDE 6:36 1:00:00 2 nid000[24-25] 261 workq ring tchoi RUNNING 5:13 1:00:00 2 nid000[24-25] Wed Feb 12 17:07:41 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 261 workq ring tchoi SUSPENDE 5:19 1:00:00 2 nid000[24-25] 260 workq cpu_mp tchoi RUNNING 7:00 1:00:00 2 nid000[24-25] Wed Feb 12 17:08:11 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 260 workq cpu_mp tchoi SUSPENDE 7:05 1:00:00 2 nid000[24-25] 261 workq ring tchoi RUNNING 5:44 1:00:00 2 nid000[24-25] Wed Feb 12 17:08:41 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 261 workq ring tchoi SUSPENDE 5:48 1:00:00 2 nid000[24-25] 260 workq cpu_mp tchoi RUNNING 7:31 1:00:00 2 nid000[24-25] Wed Feb 12 17:09:11 2014 JOBID PARTITION NAME USER STATE TIME TIMELIMIT NODES NODELIST(REASON) 260 workq cpu_mp tchoi SUSPENDE 7:34 1:00:00 2 nid000[24-25] 261 workq ring tchoi RUNNING 6:15 1:00:00 2 nid000[24-25]
Is it due to set SchedulerTimeSlice=30 and PreemptMode=GANG at /etc/opt/slurm/slurm.conf file at slurm.conf? If yes, please let us know a correct setting at slurm.conf to Suspend/Resume jobs.
Changing severity per descriptions in contract as shown below. Severity Levels 1. Severity 1 Major Impact A Severity 1 issue occurs when there is a continued system outage that affects a large set of end users. The system is down and nonfunctional due to Slurm problem(s) and no procedural workaround exists. 2. Severity 2 – High Impact A Severity 2 issue is a high impact problem that is causing sporadic outages or is consistently encountered by end users with adverse impact to end user interaction with the system. 3. Severity 3 Medium Impact A Severity 3 issue is a medium t olow impact problem that includes partial noncritical loss of system access or which impairs some operations on the system but allows the end user to continue to function on the system with workarounds. 4. Severity 4 – Minor Issues A Severity 4 issue is a minor issue with limited or no loss in functionality within the customer environment. Severity 4 issues may also be used for recommendations for future product enhancements or modifications.
It sounds like your system is configured to oversubscript resources and time slice jobs. When you suspend one job, that does not trigger an immediate resumption of the other job, but waits for the 30 second time slice to be reached. Suggest that you review gang scheduling documentation here: http://slurm.schedmd.com/gang_scheduling.html Regarding suspending a job array, I would guess that some portions of it are in a state which can not be suspended, say pending.
The "array job_id" portion of the error message was bad. It has been correct to just be "job_id". Everything else appears to be working as expected for a system with gang scheduling enabled.