| Summary: | srun ddos of slurmctld from single core array jobs | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Doug Jacobsen <dmjacobsen> |
| Component: | slurmctld | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED DUPLICATE | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | CC: | cinek, kilian, sts |
| Version: | 17.11.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=12703 | ||
| Site: | NERSC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | cori | CLE Version: | |
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | patch to add rpc_max_cnt_lb | ||
Moving this down to an enhancement for now. I've been looking into other mechanisms to rate-limit users individually, which I think would have a similar end-result, but I don't have a patch ready for that just yet. *** Ticket 12703 has been marked as a duplicate of this ticket. *** Marking closed as a duplicate of bug 14493 (private) which introduced rate limiting into Slurm 23.02. See rl_enable in slurm.conf for details on enabling this mechanism. (https://slurm.schedmd.com/slurm.conf.html#OPT_rl_enable) *** This ticket has been marked as a duplicate of ticket 14493 *** |
Created attachment 6951 [details] patch to add rpc_max_cnt_lb Hello, We've been having an issue intermittently and a lot today wherein backfill scheduling is deeply impacted by a flood of user RPCs and interactions with max_rpc_cnt. Specifically, we have one user running srun from a series of array jobs in our shared-node partition. We have asked the user to stop and change behavior, but this is a perennial issue that comes up semi-frequently. I'm attaching a stop-gap patch I'm using to add in an explicit lower-bound to the max_rpc_cnt parameter. Instead of the default behavior of restarting scheduling when only 10% of the max_rpc_cnt RPCs are left pending, this allows me to specify a range. In my case I moved cori from max_rpc_cnt=150, to max_rpc_cnt=200,max_rpc_cnt_lb=125 In practice this has tipped the balance sufficiently that within bf_maxtime slurmctld can now review the entire queue, instead of just a few hundred jobs (i.e., the reduced variance between max_rpc_cnt and max_rpc_cnt_lb probabilistically allows more scheduling to occur than the more stringent defaults). It would be even better to evaluate user RPCs in some kind of weighted priority queue using some kind of RPC fairshare so that frequent users could not saturate the system unfairly, but my guess is this is significant work, well outside the current implementation for relatively low gain -- unless I see this situation much more frequently. Thanks, Doug