Ticket 16329

Summary: Slurm Taking too much time to schedule jobs
Product: Slurm Reporter: Openfive Support <it_support>
Component: SchedulingAssignee: Director of Support <support>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 3 - Medium Impact    
Priority: ---    
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: Alphawave Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Openfive Support 2023-03-21 01:19:38 MDT
Hi Team,
No jobs are executing and it's impacting our production.

[2023-03-21T12:12:26.784] Warning: Note very large processing time from _slurm_rpc_dump_jobs: usec=5828060 began=12:12:20.956
[2023-03-21T12:12:27.745] Warning: Note very large processing time from _slurm_rpc_allocate_resources: usec=6770181 began=12:12:20.975
[2023-03-21T12:12:27.745] sched: _slurm_rpc_allocate_resources JobId=1261189 NodeList=(null) usec=6770181
[2023-03-21T12:12:27.857] Warning: Note very large processing time from _slurmctld_background: usec=6858277 began=12:12:20.999
[2023-03-21T12:12:27.857] job_signal: 9 of pending JobId=1261148 successful
[2023-03-21T12:12:28.191] Warning: Note very large processing time from dump_all_job_state: usec=4190409 began=12:12:24.001
[2023-03-21T12:12:29.925] sched: _slurm_rpc_allocate_resources JobId=1261190 NodeList=(null) usec=30493
[2023-03-21T12:12:30.224] sched: _slurm_rpc_allocate_resources JobId=1261191 NodeList=(null) usec=123222
[2023-03-21T12:12:34.593] _job_complete: JobId=1261088 WTERMSIG 126
[2023-03-21T12:12:34.593] _job_complete: JobId=1261088 cancelled by interactive user
[2023-03-21T12:12:34.594] _job_complete: JobId=1261088 done
[2023-03-21T12:12:34.594] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:35.441] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:35.466] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:36.193] _job_complete: JobId=1261025 WEXITSTATUS 0
[2023-03-21T12:12:36.193] _job_complete: JobId=1261025 done
[2023-03-21T12:12:36.238] sched: _slurm_rpc_allocate_resources JobId=1261192 NodeList=(null) usec=26488
[2023-03-21T12:12:36.373] sched: _slurm_rpc_allocate_resources JobId=1261193 NodeList=(null) usec=26556
[2023-03-21T12:12:36.611] Time limit exhausted for JobId=1174490
[2023-03-21T12:12:36.826] _slurm_rpc_complete_job_allocation: JobId=1174490 error Job/step already completing or completed
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 WTERMSIG 126
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 cancelled by interactive user
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 done
[2023-03-21T12:12:39.894] _slurm_rpc_complete_job_allocation: JobId=1261028 error Job/step already completing or completed
[2023-03-21T12:12:41.323] _job_complete: JobId=1261023 WEXITSTATUS 0
[2023-03-21T12:12:41.323] _job_complete: JobId=1261023 done
Comment 2 Jason Booth 2023-03-21 11:22:39 MDT
I am closing this out as a duplicate of bug#16219. Nate will follow up on that bug with some recommendations based on our call today.

*** This ticket has been marked as a duplicate of ticket 16219 ***