Ticket 5225

Summary: srun ddos of slurmctld from single core array jobs
Product: Slurm Reporter: Doug Jacobsen <dmjacobsen>
Component: slurmctldAssignee: Tim Wickberg <tim>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: cinek, kilian, sts
Version: 17.11.6   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=12703
Site: NERSC Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name: cori
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: patch to add rpc_max_cnt_lb

Description Doug Jacobsen 2018-05-30 02:27:57 MDT
Created attachment 6951 [details]
patch to add rpc_max_cnt_lb

Hello,

We've been having an issue intermittently and a lot today wherein backfill scheduling is deeply impacted by a flood of user RPCs and interactions with max_rpc_cnt.

Specifically, we have one user running srun from a series of array jobs in our shared-node partition.  We have asked the user to stop and change behavior, but this is a perennial issue that comes up semi-frequently.

I'm attaching a stop-gap patch I'm using to add in an explicit lower-bound to the max_rpc_cnt parameter.  Instead of the default behavior of restarting scheduling when only 10% of the max_rpc_cnt RPCs are  left pending, this allows  me to specify a range.

In my case I moved cori from max_rpc_cnt=150, to max_rpc_cnt=200,max_rpc_cnt_lb=125

In practice this has tipped the balance sufficiently that within bf_maxtime slurmctld can now review the  entire queue, instead of just a few hundred jobs  (i.e., the reduced variance between max_rpc_cnt and max_rpc_cnt_lb probabilistically allows more scheduling to occur than the more stringent defaults).

It would be even better to evaluate user RPCs in some kind of weighted priority queue using some kind of RPC fairshare so that frequent users could not saturate the system unfairly, but my guess is this is significant work, well outside the current implementation for relatively low gain -- unless I see this situation much more frequently.

Thanks,
Doug
Comment 1 Tim Wickberg 2018-05-31 10:57:33 MDT
Moving this down to an enhancement for now.

I've been looking into other mechanisms to rate-limit users individually, which I think would have a similar end-result, but I don't have a patch ready for that just yet.
Comment 2 Tim Wickberg 2021-10-21 12:15:50 MDT
*** Ticket 12703 has been marked as a duplicate of this ticket. ***
Comment 5 Tim Wickberg 2023-07-14 09:28:18 MDT
Marking closed as a duplicate of bug 14493 (private) which introduced rate limiting into Slurm 23.02.

See rl_enable in slurm.conf for details on enabling this mechanism. 
(https://slurm.schedmd.com/slurm.conf.html#OPT_rl_enable)

*** This ticket has been marked as a duplicate of ticket 14493 ***