5817 – Explore slurmdbd designs to avoid memory usage peaks due to overly-aggressive queries

Ticket 5817 - Explore slurmdbd designs to avoid memory usage peaks due to overly-aggressive queries

Summary: Explore slurmdbd designs to avoid memory usage peaks due to overly-aggressive...

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmdbd (show other tickets)
Version:	19.05.x
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	Unassigned Developer
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-10-05 06:06 MDT by Alejandro Sanchez
Modified:	2024-02-28 16:26 MST (History)
CC List:	3 users (show)

See Also:	7317 9453 19051
Site:	SchedMD
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Alejandro Sanchez 2018-10-05 06:06:46 MDT

As tracked in bug 5632, overly-aggressive queries (i.e. using sacct on large job/step tables and retrieving huge amounts of records) can lead to slurmdbd memory usage grow up to ~80GiB, potentially triggering oom-killer depending on node available memory and system limits.

While MaxQueryTimeRange and/or REASONABLE_BUF_SIZE (currently 3GiB) can alleviate the response size back to the requesting client, the previous step where slurmdbd queries the underlying storage database for huge amounts of data can trigger the memory usage peak.

This bug is intended to explore ideas / architectural changes to avoid increasing slurmdbd memory usage for this scenario.

Suggestions so far:

1. mysql_query() API function doesn't accept a maximum return size as a parameter. Discarded.

2. The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. We could make some math (I've seen this in another bug) to estimate the size of a job/step row, then divide 3GiB by such size to set LIMIT and append it to the query used for sacct, although that would impact whatever calling jobacct_storage_p_get_jobs_cond.

3. Instead of receiving all data and once finished, check MaxQueryTimeRange/3GiB as soon as we have rows available in slurmdbd and start packing them in a consumer/producer fashion. Not sure if feasible at all.

4. Make a procedure and then use CALL to retrieve jobs in chunks (different result sets).

5. Use a loop to seek over the result set by using mysql_data_seek(), and perhaps combine this with option 3, since mysql_fetch_row() might be too slow.

6. Perhaps there's some benefit from partitioning[1] the job/step tables? LIST or RANGE partitioning seem feasible, and then use PARTITION statement when SELECT'ing job/steps.

7. Other ideas?

[1] https://dev.mysql.com/doc/refman/8.0/en/partitioning.html