Ticket 6732

Summary: Improve slurmctld threads/agents tracking
Product: Slurm Reporter: Alejandro Sanchez <alex>
Component: slurmctldAssignee: Alejandro Sanchez <alex>
Status: OPEN --- QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: bart, csamuel, nate
Version: 19.05.x   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=6849
https://bugs.schedmd.com/show_bug.cgi?id=7501
https://bugs.schedmd.com/show_bug.cgi?id=7928
Site: SchedMD Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Alejandro Sanchez 2019-03-20 11:13:52 MDT
slurmctld maintains a separate count of incoming message traffic in slurmctld_config.server_thread_count and outgoing message traffic in agent_cnt / agent_thread_cnt. 

Then ctld defers certain operations if the former remains below certain thresholds or the latter remains below others, but in some places, it might be desirable to observe if the sum of both incoming/outgoing remains below a reasonable number.

Background discussion:

https://bugs.schedmd.com/show_bug.cgi?id=5443#c69

and

https://bugs.schedmd.com/show_bug.cgi?id=5443#c170

and

https://bugs.schedmd.com/show_bug.cgi?id=6189#c10
Comment 4 Nate Rini 2019-08-28 18:18:24 MDT
*** Ticket 7501 has been marked as a duplicate of this ticket. ***
Comment 5 Alejandro Sanchez 2020-08-27 03:50:18 MDT
After letting sites breath for a while with 2b25aa4e555 and 67a9c2f786e8 it looks like things got better, so the rest of the proposed ideas can be reclassified as sev-5.