We suspect that our daily utilization reports kicked off a query more than 4GB in size. I'll try to determine what the query was doing, but this ticket is to ask whether this error condition can be handled more gracefully. Our config file is attached. Notes: slurmdbd version is 15.08.13.10 Currently, slurdb logs about 79K lines, or 10 MB of errors, per second, continuously, until all space on /var/log/slurmdbd.log's filesystem is consumed. Here are some statistics on the messages logged for a 1 second period: drdslurm0001:log$ sed -e 's/(.*>/( LARGENUMBER >/' /tmp/slurmdbd.log | sort | uniq -c 1267 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: pack16: Buffer size limit exceeded ( LARGENUMBER > 4294901760) 46211 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: pack32: Buffer size limit exceeded ( LARGENUMBER > 4294901760) 3798 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: pack64: Buffer size limit exceeded ( LARGENUMBER > 4294901760) 12660 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: packdouble: Buffer size limit exceeded ( LARGENUMBER > 4294901760) 11395 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: packmem: Buffer size limit exceeded ( LARGENUMBER > 4294901760) 3798 Feb 20 09:52:20 drdslurm0001.en.desres.deshaw.com slurmdbd[13469]: error: pack_time: Buffer size limit exceeded ( LARGENUMBER > 4294901760)
Created attachment 4080 [details] slurmdbd.conf
I'm assuming you meant "15.08.13"? That maintenance release wasn't in the list previously, that's been corrected now. Is this query being constantly re-run, or does this recur after restarting slurmdbd?
The problem has not recurred since restarting slurmdbd, however I've also not yet confirm that our daily utilization query kicked it off. Looking at old slurmdbd log files it seems to have happened 1 week and 1 hour prior to yesterday's event, so I have a pretty good chance of finding the script responsible. I'll update here once I do.
Okay, I just wanted to make sure this wasn't blocking slurmdbd from normal service. There's an enhancement bug 2346 open that covers adding some configuration options to help prevent these from triggering, although we haven't made any commitment to addressing this just yet.
Goran - I'm marking this closed as a duplicate of 3624. We'll try to get some mitigation in place to keep the log level spam to a minimum. As mentioned, bug 2346 discusses longer-term plans to mitigate this issue with some configuration options to limit the queries directly. - Tim *** This ticket has been marked as a duplicate of ticket 3624 ***