| Summary: | DB job query limit | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Wei Feinstein <wfeinstein> |
| Component: | Database | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.6 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | LBNL - Lawrence Berkeley National Laboratory | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Wei Feinstein
2023-01-23 00:59:58 MST
Hi Wei, This is a legitimate concern. Calls to sacct go to slurmdbd, not slurmctld, so it is a positive that you're not going to stress your scheduler by querying these jobs. Since it's not going to slurmctld the max_rpc_cnt isn't going to have an effect. The limit I would worry about hitting is the maximum number of connections that MySQL allows at a time. The default number of simultaneous connections that MySQL allows is 151. https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_connections You can adjust that to be higher, but I would be cautious not to make it too high either. Even if you allow a large number of connections, there are still hardware limitations that would cause these to be slower, like the amount of CPU time or memory available or the IO limitations of the drive the database is sitting on. I think that you can make the queries you're doing more parallel, but I would not try to spread things out too much. If you could break the queries down from covering a period that would return 250,000 jobs to something like 4 smaller queries, that would allow you to see how that performs compared to the single large query. It's hard to say exactly how to optimize something like this because there are many variables, but once you have a few data points with the broken down queries you can adjust dividing them further and see if you can find the point at which you get results the quickest. You would also want to watch the slurmdbd logs to make sure there aren't errors about connections to the database being refused. Backing up a little, it's also possible that if you're trying to get information about old jobs that you could move the job data from your active database to an archive database and query things there without worrying about what the active system is doing. This may not be an option if you're trying to get information on current jobs. I'm sorry I don't have a more definite answer, but hopefully this helps. Thanks, Ben Hi Ben, Thanks for the timely reply and informal suggestions. It is good to know that the maximum number of connections to MySQL needs more attention rather than #RPC calls to slurmctld. I think I will have some test runs in order to find the sweet spot, where the parallelization can speed up the queries without affecting the slurmDB performance. Many thanks, Wei That sounds like a plan. Let me know if there's anything else I can do to help here. If not I'll plan on closing this ticket. Thanks, Ben Hi Wei, I haven't heard any follow up questions so I assume things are working for you. I'll close the ticket, but feel free to let me know if something related to this comes up. Thanks, Ben |