Ticket 2331

Summary:	synchronous mode for sacctmgr?
Product:	Slurm	Reporter:	Doug Jacobsen <dmjacobsen>
Component:	slurmdbd	Assignee:	Unassigned Developer <dev-unassigned>
Status:	OPEN ---	QA Contact:
Severity:	5 - Enhancement
Priority:	---	CC:	matthews
Version:	15.08.6
Hardware:	Linux
OS:	Linux
Site:	NERSC	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Doug Jacobsen 2016-01-12 01:28:27 MST

Hello,

I need to keep the users, accounts, and associations slaved to an institutional database we have (NIM - NERSC Information Management), which stores similar data as the SLURM accounting database but is focused on our business logic.

Anyway, I have a python script that manages adjusting the slurm database to match the NIM database periodically.

The problem is that sometimes the changes can count in the hundreds or thousands (like today -- a new allocation year for us!)

If a slurmctld is running then every command issued with sacctmgr seems to generate an asynchronous update of the slurmctld cache.  It also seems that a thread is launched, and -- I'm guessing here -- that the thread is obtaining exclusive access to some lock.  I say all this because when we get these large count of updates, I rapidly find myself in a situation where slurmctld is completely maxed out at 256 threads and I just have to kill the update process and wait for everything to unjam...

It would be very helpful if I could put sacctmgr in a synchronous or serialized mode wherein it would not unblock or move on to the next command until all the slurmctld processes attached to slurmdbd have acknowledged completing the last thing.

Thanks,
Doug

Comment 1 Tim Wickberg 2016-02-24 06:18:47 MST

It's an interesting idea, and I've heard at least one other place that may benefit from something similar.

Are you doing this as independent sacctmgr commands, or as a series of commands fed to the sacctmgr on stdin?

Have you looked at using the load commands to batch a series of updates? It might help some, although it does look like that still may generate separate updates which could still lead to thread exhaustion.

Comment 2 Doug Jacobsen 2016-02-24 06:48:46 MST

Hi Tim,

I pipe a list of commands into sacctmgr and run them (or did when this was filed).  Now I callout each one individually as an sacctmgr call (so I can get a positive exit status on that one command), and my controller script does a configurable (typically 10 updates) at a time with sleeps in between to minimize the risk of flooding slurmctld with asynchronous updates.

The load capability is of interest, but is perhaps not as useful since I can't determine (easily) which command in a pile of commands failed.

-Doug

Comment 3 Doug Jacobsen 2016-03-15 10:20:56 MDT

Hello,

This morning we had an outage on cori, I believe owing to something related to this issue. We reached max threads and from then on it only logged updates from the aeld and bb maintenance threads. I have logic to prevent "too many" simultaneous updates via sacctmgr (do 10 updates, sleep 15s, repeat); but it also seems that the accounting update RPCs can deadlock.

There are a few issues in my mind around this:
1) given the do 10 updates, sleep 15 logic, there couldn't have been sufficient count of updates to achieve this, however it seems that slurmdbd was pushing through a lot more, many many associations were getting updated according to slurmctld.log

2) once max threads was hit it stayed there, never released. I wonder if there is a deadlock somewhere for the accounting data update. Unfortunately we didn't get a core dump or stack trace of the threads to verify what they were. I'm writing up a procedure so that others can get coredumps I can later analyze in situations like this.

Anyway this issue is starting to creep up in importance for me - either to increase atomicity of the sacctmgr call, or to reduce locking issues around slurmctld accounting update threads (or both)

I saw that there is a Max RPCs setting, however have been hesitant to set it. It seems like setting Max RPCs to some value lower than 256 (max threads) but high enough to allow sufficient throughput of parallel operations would be a good thing. Are there recommendations around what maxrpcs should be for a system of our scale and overall busyness?

Thanks,
Doug

Comment 4 Doug Jacobsen 2017-10-01 16:02:04 MDT

Hello,

Tim and I discussed this a bit at SLUG 2017.  Instead of a synchronous mode, we discussed the possibility of a way to disable updates to slurmctld for a short period while queueing up a number of updates.  I think the basic idea was a flag to sacctmgr that would disable slurmctld updates for 10s (or something), then each successive call to sacctmgr could essentially "keep-alive" that delay until the entire campaign of updates is presented to sacctmgr, allowing a single large update to transmit to slurmctld.

Hopefully some of these ideas enable further progress on this issue.  This is of increasing concern because we expect to multiply by 10 the number of daily updates via sacctmgr as we introduce finer-grain limits in the coming months.

Thanks,
Doug