Created attachment 6598 [details] Change check order do the fast one before We have activated privatedata on our cluster, some users have (bad) scripts that generate dozens of squeue queries per seconds. If there is thousands of jobs belonging to other users, slurmctld becomes swamped and spends all its time finding if the user is manager or coodinator for the account. This is obviously a problem that should be addressed by modifying the user scripts, and we are doing this. Meanwhile, we realized that when the job is packed, the filter on uid if present is done after the very costly _hide_job. https://github.com/SchedMD/slurm/blob/0aa82e91794e76dcc5aecb2818ff3da23bd8fb45/src/slurmctld/job_mgr.c#L9517 The attached patch reverse this order and avoid doing the very complex processing when the job has not the right UID in the first place. For example an extract of a "perf report" output on a test cluster where we replicated the issue: ---- 80.02% 11.85% srvcn libslurmfull.so [.] slurm_list_next | --- slurm_list_next | |--49.89%-- | validate_operator.part.7 | _hide_job | _pack_job | _foreach_pack_job_ptr | slurm_list_for_each | pack_all_jobs | _slurm_rpc_dump_jobs | slurmctld_req | _service_connection | start_thread | __clone | |--49.51%-- assoc_mgr_is_user_acct_coord | _hide_job | _pack_job | _foreach_pack_job_ptr | slurm_list_for_each | pack_all_jobs | _slurm_rpc_dump_jobs | slurmctld_req | _service_connection | start_thread | __clone | --0.60%-- slurmdb_qos_str pack_job ---- To reproduce : * activate privatedata for jobs * submit 2000 jobs with user A * repeatedly launch ~100 "squeue -u B" in parallel with user B Ideally processing _hide_job could be less costly by not calling assoc_mgr_is_user_acct_coord and validate_operator with the same params over and over again (for each job, for each query).
Thanks Thomas. Committed now with some minor changes to the commit message and an added NEWS entry. This will be in 17.11.6 / 18.08.0-pre2 when released: commit e4b531c23ef97782f49fb21d7877688480f860e6 Author: Thomas HAMEL <thomas-externe.hamel@edf.fr> Date: Tue Apr 10 18:00:08 2018 +0200 slurmctld: check UID in pack_job before hiding Improve performance of 'squeue -u' when PrivateData=jobs is enabled by moving the UID filter code ahead of the more expensive PrivateData=job checks. Bug 5056.