Ticket 16329 - Slurm Taking too much time to schedule jobs
Summary: Slurm Taking too much time to schedule jobs
Status: RESOLVED DUPLICATE of ticket 16219
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: - Unsupported Older Versions
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Director of Support
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-03-21 01:19 MDT by Openfive Support
Modified: 2023-03-21 11:22 MDT (History)
0 users

See Also:
Site: Alphawave
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Openfive Support 2023-03-21 01:19:38 MDT
Hi Team,
No jobs are executing and it's impacting our production.

[2023-03-21T12:12:26.784] Warning: Note very large processing time from _slurm_rpc_dump_jobs: usec=5828060 began=12:12:20.956
[2023-03-21T12:12:27.745] Warning: Note very large processing time from _slurm_rpc_allocate_resources: usec=6770181 began=12:12:20.975
[2023-03-21T12:12:27.745] sched: _slurm_rpc_allocate_resources JobId=1261189 NodeList=(null) usec=6770181
[2023-03-21T12:12:27.857] Warning: Note very large processing time from _slurmctld_background: usec=6858277 began=12:12:20.999
[2023-03-21T12:12:27.857] job_signal: 9 of pending JobId=1261148 successful
[2023-03-21T12:12:28.191] Warning: Note very large processing time from dump_all_job_state: usec=4190409 began=12:12:24.001
[2023-03-21T12:12:29.925] sched: _slurm_rpc_allocate_resources JobId=1261190 NodeList=(null) usec=30493
[2023-03-21T12:12:30.224] sched: _slurm_rpc_allocate_resources JobId=1261191 NodeList=(null) usec=123222
[2023-03-21T12:12:34.593] _job_complete: JobId=1261088 WTERMSIG 126
[2023-03-21T12:12:34.593] _job_complete: JobId=1261088 cancelled by interactive user
[2023-03-21T12:12:34.594] _job_complete: JobId=1261088 done
[2023-03-21T12:12:34.594] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:35.441] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:35.466] _slurm_rpc_complete_job_allocation: JobId=1261088 error Job/step already completing or completed
[2023-03-21T12:12:36.193] _job_complete: JobId=1261025 WEXITSTATUS 0
[2023-03-21T12:12:36.193] _job_complete: JobId=1261025 done
[2023-03-21T12:12:36.238] sched: _slurm_rpc_allocate_resources JobId=1261192 NodeList=(null) usec=26488
[2023-03-21T12:12:36.373] sched: _slurm_rpc_allocate_resources JobId=1261193 NodeList=(null) usec=26556
[2023-03-21T12:12:36.611] Time limit exhausted for JobId=1174490
[2023-03-21T12:12:36.826] _slurm_rpc_complete_job_allocation: JobId=1174490 error Job/step already completing or completed
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 WTERMSIG 126
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 cancelled by interactive user
[2023-03-21T12:12:39.894] _job_complete: JobId=1261028 done
[2023-03-21T12:12:39.894] _slurm_rpc_complete_job_allocation: JobId=1261028 error Job/step already completing or completed
[2023-03-21T12:12:41.323] _job_complete: JobId=1261023 WEXITSTATUS 0
[2023-03-21T12:12:41.323] _job_complete: JobId=1261023 done
Comment 2 Jason Booth 2023-03-21 11:22:39 MDT
I am closing this out as a duplicate of bug#16219. Nate will follow up on that bug with some recommendations based on our call today.

*** This ticket has been marked as a duplicate of ticket 16219 ***