Ticket 5056

Summary: Mitigate excessive CPU usage during "squeue -u" with privatedata
Product: Slurm Reporter: Thomas HAMEL <hmlth>
Component: slurmctldAssignee: Tim Wickberg <tim>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 17.11.2   
Hardware: Linux   
OS: Linux   
Site: EDF - Electricite de France Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 17.11.6 18.08.0-pre2
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Change check order do the fast one before

Description Thomas HAMEL 2018-04-11 10:28:59 MDT
Created attachment 6598 [details]
Change check order do the fast one before

We have activated privatedata on our cluster, some users have (bad) scripts that generate dozens of squeue queries per seconds. If there is thousands of jobs belonging to other users, slurmctld becomes swamped and spends all its time finding if the user is manager or coodinator for the account.

This is obviously a problem that should be addressed by modifying the user scripts, and we are doing this. Meanwhile, we realized that when the job is packed, the filter on uid if present is done after the very costly _hide_job.

https://github.com/SchedMD/slurm/blob/0aa82e91794e76dcc5aecb2818ff3da23bd8fb45/src/slurmctld/job_mgr.c#L9517

The attached patch reverse this order and avoid doing the very complex processing when the job has not the right UID in the first place.

For example an extract of a "perf report" output on a test cluster where we replicated the issue:

----
    80.02%    11.85%      srvcn  libslurmfull.so     [.] slurm_list_next
                |
                --- slurm_list_next
                   |
                   |--49.89%--
                   |          validate_operator.part.7
                   |          _hide_job
                   |          _pack_job
                   |          _foreach_pack_job_ptr
                   |          slurm_list_for_each
                   |          pack_all_jobs
                   |          _slurm_rpc_dump_jobs
                   |          slurmctld_req
                   |          _service_connection
                   |          start_thread
                   |          __clone
                   |
                   |--49.51%-- assoc_mgr_is_user_acct_coord
                   |          _hide_job
                   |          _pack_job
                   |          _foreach_pack_job_ptr
                   |          slurm_list_for_each
                   |          pack_all_jobs
                   |          _slurm_rpc_dump_jobs
                   |          slurmctld_req
                   |          _service_connection
                   |          start_thread
                   |          __clone
                   |
                    --0.60%-- slurmdb_qos_str
                              pack_job
----

To reproduce :

* activate privatedata for jobs
* submit 2000 jobs with user A
* repeatedly launch ~100 "squeue -u B" in parallel with user B

Ideally processing _hide_job could be less costly by not calling assoc_mgr_is_user_acct_coord and validate_operator with the same params over and over again (for each job, for each query).
Comment 1 Tim Wickberg 2018-04-16 11:59:20 MDT
Thanks Thomas. Committed now with some minor changes to the commit message and an added NEWS entry. This will be in 17.11.6 / 18.08.0-pre2 when released:

commit e4b531c23ef97782f49fb21d7877688480f860e6
Author: Thomas HAMEL <thomas-externe.hamel@edf.fr>
Date:   Tue Apr 10 18:00:08 2018 +0200

    slurmctld: check UID in pack_job before hiding
    
    Improve performance of 'squeue -u' when PrivateData=jobs is
    enabled by moving the UID filter code ahead of the more expensive
    PrivateData=job checks.
    
    Bug 5056.