3260 – slurmdbd out of memory, dumping core

Ticket 3260 - slurmdbd out of memory, dumping core

Summary: slurmdbd out of memory, dumping core

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	slurmdbd (show other tickets)
Version:	15.08.12
Hardware:	Linux Linux

Severity:	3 - Medium Impact
Assignee:	Alejandro Sanchez
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2016-11-15 13:39 MST by Chris
Modified:	2016-12-28 07:06 MST (History)
CC List:	1 user (show)

See Also:
Site:	D E Shaw Research
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
journalctl -u slurmcl15dbd --since today > /var/tmp/slurmcl15dbd.log (424.59 KB, application/x-gzip) 2016-11-15 13:39 MST, Chris	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Chris 2016-11-15 13:39:51 MST

Created attachment 3714 [details]
journalctl -u slurmcl15dbd --since today > /var/tmp/slurmcl15dbd.log

Twice in the last two days I've had an alert that slurmdbd was down. Upon investigation it looks that it ran out of memory and dumped core which, rather unfortunately, I didn't have enough space for in the file system. Any advice? 


Here is a little bit of the log, attaching the rest from today.
Nov 15 14:08:07 drdslurm0001.en.desres.deshaw.com slurmdbd[1976]: error: mysql_query failed: 2008 MySQL client ran out of memory
                                                                  select t1.id_step, t1.time_start, t1.time_end, t1.time_suspended, t1.step_name, t1.nodelist, t1.node_inx, t1.state, t1.kill_requid, t1.exit_code, t1.nodes_alloc, t1.task_cnt, t1.task_dist, t1.user_sec, t1.user_usec, t1.sys_sec, t1.sys_usec, t1.max_disk_read, t1.max_disk_read_task, t1.max_disk_read_node, t1.ave_disk_read, t1.max_disk_write, t1.max_disk_write_task, t1.max_disk_write_node, t1.ave_disk_write, t1.max_vsize, t1.max_vsize_t
Nov 15 14:08:07 drdslurm0001.en.desres.deshaw.com slurmdbd[1976]: error: mysql_query failed: 2008 MySQL client ran out of memory
                                                                  select t1.id_step, t1.time_start, t1.time_end, t1.time_suspended, t1.step_name, t1.nodelist, t1.node_inx, t1.state, t1.kill_requid, t1.exit_code, t1.nodes_alloc, t1.task_cnt, t1.task_dist, t1.user_sec, t1.user_usec, t1.sys_sec, t1.sys_usec, t1.max_disk_read, t1.max_disk_read_task, t1.max_disk_read_node, t1.ave_disk_read, t1.max_disk_write, t1.max_disk_write_task, t1.max_disk_write_node, t1.ave_disk_write, t1.max_vsize, t1.max_vsize_t
Nov 15 14:08:07 drdslurm0001.en.desres.deshaw.com slurmdbd[1976]: error: mysql_query failed: 2008 MySQL client ran out of memory
                                                                  select time_start, time_end from "cl_suspend_table" where (time_start < 1479204000 && (time_end >= 1477972800 || time_end = 0)) && job_db_inx=40961239 order by time_start
Nov 15 14:08:07 drdslurm0001.en.desres.deshaw.com slurmdbd[1976]: error: mysql_query failed: 2008 MySQL client ran out of memory
                                                                  select t1.id_step, t1.time_start, t1.time_end, t1.time_suspended, t1.step_name, t1.nodelist, t1.node_inx, t1.state, t1.kill_requid, t1.exit_code, t1.nodes_alloc, t1.task_cnt, t1.task_dist, t1.user_sec, t1.user_usec, t1.sys_sec, t1.sys_usec, t1.max_disk_read, t1.max_disk_read_task, t1.max_disk_read_node, t1.ave_disk_read, t1.max_disk_write, t1.max_disk_write_task, t1.max_disk_write_node, t1.ave_disk_write, t1.max_vsize, t1.max_vsize_t
Nov 15 14:08:07 drdslurm0001.en.desres.deshaw.com slurmdbd[1976]: out of memory [1976]
Nov 15 14:15:05 drdslurm0001.en.desres.deshaw.com systemd[1]: slurmcl15dbd.service: main process exited, code=killed, status=11/SEGV
Nov 15 14:15:05 drdslurm0001.en.desres.deshaw.com systemd[1]: Unit slurmcl15dbd.service entered failed state.
Nov 15 14:15:05 drdslurm0001.en.desres.deshaw.com systemd[1]: slurmcl15dbd.service failed.
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com systemd[1]: Starting Slurm DBD accounting daemon...
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com slurmdbd[32218]: debug:  auth plugin for Munge (http://code.google.com/p/munge/) loaded
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com slurmdbd[32218]: error: mysql_query failed: 1021 Disk full (/var/tmp/#sql_1256_0); waiting for someone to free some space...
                                                                   show columns from cluster_table
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com slurmdbd[32218]: Accounting storage MYSQL plugin failed
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com slurmdbd[32218]: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com systemd[1]: slurmcl15dbd.service: control process exited, code=exited status=1
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com systemd[1]: Failed to start Slurm DBD accounting daemon.
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com systemd[1]: Unit slurmcl15dbd.service entered failed state.
Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com systemd[1]: slurmcl15dbd.service failed.
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com systemd[1]: Starting Slurm DBD accounting daemon...
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com slurmdbd[32258]: debug:  auth plugin for Munge (http://code.google.com/p/munge/) loaded
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com slurmdbd[32258]: Accounting storage MYSQL plugin loaded
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com systemd[1]: PID file /var/spool/slurmcl15/slurmcl15dbd.pid not readable (yet?) after start.
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com systemd[1]: Started Slurm DBD accounting daemon.
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com slurmdbd[32261]: chdir to /var/tmp
Nov 15 14:24:25 drdslurm0001.en.desres.deshaw.com slurmdbd[32261]: slurmdbd version 15.08.12 started


The box which only has slurmdbd and mariadb has 64GB of memory. Do I just need to figure out a way to get a box with more?  Any intermediate steps I could take? 

# what was going on about this time? there were a lot of short jobs. could that have caused an issue? Anything I should adjust or pay special attention to? Would commit delay help here or just expose more?

drdlogin0044:slurmadmin$ ./slurmcl15_scheduled_in_last_hr
slurmcl2 21248 started in the hour of 2016-11-15T14
for a listing see grep 2016-11-15T14 /scratch/harwell/tmp.z3kX4s0Ptw



drdlogin0044:slurmadmin$ grep 2016-11-15T14 /scratch/harwell/tmp.z3kX4s0Ptw | awk '/COMPLETED/ { print $8 }'  | grep -v -w '0:0' | /u/nyc/harwell/p4ws/desrad/user/desrad_harwell/hr:min:sec-to-sec.py - | garden with -m bitly/data_hacks/20101209-01A/bin histogram.py -
# NumSamples = 19673; Min = 0.00; Max = 1068.00
# Mean = 37.787984; Variance = 2378.087464; SD = 48.765638; Median 22.000000
# each * represents a count of 247
    0.0000 -   106.8000 [ 18566]: ***************************************************************************
  106.8000 -   213.6000 [   999]: ****
  213.6000 -   320.4000 [    19]: 
  320.4000 -   427.2000 [     6]: 
  427.2000 -   534.0000 [     6]: 
  534.0000 -   640.8000 [    73]: 
  640.8000 -   747.6000 [     3]: 
  747.6000 -   854.4000 [     0]: 
  854.4000 -   961.2000 [     0]: 
  961.2000 -  1068.0000 [     1]: 

drdlogin0044:slurmadmin$ grep 2016-11-15T14 /scratch/harwell/tmp.z3kX4s0Ptw | awk '/COMPLETED/ { print $8 }'  | grep -v -w '0:0' | /u/nyc/harwell/p4ws/desrad/user/desrad_harwell/hr:min:sec-to-sec.py - | garden with -m bitly/data_hacks/20101209-01A/bin histogram.py --max=100 -
# NumSamples = 19673; Min = 0.00; Max = 100.00
# 1457 values outside of min/max
# Mean = 37.787984; Variance = 2378.087464; SD = 48.765638; Median 22.000000
# each * represents a count of 81
    0.0000 -    10.0000 [  2987]: ************************************
   10.0000 -    20.0000 [  6148]: ***************************************************************************
   20.0000 -    30.0000 [  3271]: ****************************************
   30.0000 -    40.0000 [  1808]: **********************
   40.0000 -    50.0000 [  1064]: *************
   50.0000 -    60.0000 [   505]: ******
   60.0000 -    70.0000 [   500]: ******
   70.0000 -    80.0000 [   656]: ********
   80.0000 -    90.0000 [   605]: *******
   90.0000 -   100.0000 [   672]: ********


[root@drdslurm0001 tmp]# free -m
              total        used        free      shared  buff/cache   available
Mem:          64000       16769       45748          15        1483       46856
Swap:         65535        6094       59441
[root@drdslurm0001 tmp]# uname -a
Linux drdslurm0001.en.desres.deshaw.com 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
drdws0109:~$ smcl show config
Configuration data as of 2016-11-15T14:59:52
AccountingStorageBackupHost  = (null)
AccountingStorageHost  = slurmcl2db.en.desres.deshaw.com
AccountingStorageLoc   = N/A
AccountingStoragePass  = (null)
AccountingStoragePort  = 6819
AccountingStorageType  = accounting_storage/slurmdbd
AccountingStorageUser  = N/A
AuthType               = auth/munge
MessageTimeout         = 100 sec
PluginDir              = /opt/slurmcl15/lib64/slurm
PrivateData            = none
SlurmUserId            = slurmcl2(406)
SLURM_CONF             = /opt/slurmcl15/etc/slurm.conf
SLURM_VERSION          = 15.08.12
TrackWCKey             = 0

SlurmDBD configuration:
ArchiveDir             = /var/spool/slurmcl15/
ArchiveEvents          = Yes
ArchiveJobs            = Yes
ArchiveResvs           = Yes
ArchiveScript          = (null)
ArchiveSteps           = No
ArchiveSuspend         = No
AuthInfo               = (null)
AuthType               = auth/munge
BOOT_TIME              = 2016-11-15T14:24:25
CommitDelay            = No
DbdAddr                = localhost
DbdBackupHost          = (null)
DbdHost                = localhost
DbdPort                = 6819
DebugFlags             = DB_Archive
DebugLevel             = debug
DefaultQOS             = debug,normal,low,standby,high,sim-nice
LogFile                = (null)
MessageTimeout         = 30 secs
PidFile                = /var/spool/slurmcl15/slurmcl15dbd.pid
PluginDir              = /opt/slurmcl15/lib64/slurm
PrivateData            = accounts
PurgeEventAfter        = 1440 hours*
PurgeJobAfter          = 1440 hours*
PurgeResvAfter         = 12 months*
PurgeStepAfter         = 720 hours
PurgeSuspendAfter      = 24 hours
SLURMDBD_CONF          = /opt/slurmcl15/etc/slurmdbd.conf
SLURMDBD_VERSION       = 15.08.12
SlurmUser              = slurmcl2(406)
StorageBackupHost      = (null)
StorageHost            = localhost
StorageLoc             = slurmcl2db
StoragePort            = 3306
StorageType            = accounting_storage/mysql
StorageUser            = slurmcl2
TrackWCKey             = No
TrackSlurmctldDown     = No

Comment 2 Alejandro Sanchez 2016-11-16 05:30:23 MST

(In reply to Chris from comment #0)
> Created attachment 3714 [details]
> journalctl -u slurmcl15dbd --since today > /var/tmp/slurmcl15dbd.log
> 
> Twice in the last two days I've had an alert that slurmdbd was down. Upon
> investigation it looks that it ran out of memory and dumped core which,
> rather unfortunately, I didn't have enough space for in the file system. Any
> advice?

I see that your ArchiveDir is /var/spool/slurmcl15/ and you are archiving Events, Jobs and Resvs. I also see this error:

Nov 15 14:22:49 drdslurm0001.en.desres.deshaw.com slurmdbd[32218]: error: mysql_query failed: 1021 Disk full (/var/tmp/#sql_1256_0); waiting for someone to free some space...

I'm wondering if the archive files (or other logs) filled the /var filesystem and MySQL was thus unable to use the filesytem for temporary copies of tables. Another symptom of disk full is that there wasn't enough space for the core file to be generated.

- Can you check the space on /var?
- Can you check the output of the core file size with ulimit -a and change it to unlimited if it's limited?

> The box which only has slurmdbd and mariadb has 64GB of memory. Do I just
> need to figure out a way to get a box with more?  Any intermediate steps I
> could take? 

I believe 64GB of memory should suffice for slurmdbd to work properly.

> 
> # what was going on about this time? there were a lot of short jobs. could
> that have caused an issue? Anything I should adjust or pay special attention
> to? Would commit delay help here or just expose more?

The documentation on CommitDelay specifies that if running a very high throughput of jobs you should consider setting this, which seems the case since you mention there were a lot of short jobs.

Comment 3 Alejandro Sanchez 2016-11-16 06:02:29 MST

Chris - a few more questions and notes:

- How many jobs / job steps are you running per day?
- Can you run:
  
mysql> select count(*) from <cluster>_job_table where time_start > unix_timestamp('2016-11-15');

- How large is the raw mysql database at this point?
- Are you running any sacct query asking for records since long time ago? Heavy queries like that could go over the response buffer (4GB):

error: packdouble: Buffer size limit exceeded (4294901818 > 4294901760)

- I believe increasing the Purge frequency might help mitigate the issue too.
- innodb_buffer_pool_size can have a huge impact - we'd recommend setting this as high as half the RAM available on the slurmdbd server. You can check the current setting in MySQL like so:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

- Setting innodb_lock_wait_timeout and innodb_log_file_size to larger values than the default is also recommended.

Comment 4 Chris 2016-11-16 06:27:00 MST

Thanks! I will get back to you with that info.

By purge frequency increase do you mean shorter job and step retention times?

-------- Original Message --------

Subject: [Bug 3260] slurmdbd out of memory, dumping core

From: bugs@schedmd.com

Date: Nov 16, 2016, 08:04

To: "Harwell, Chris" <Chris.Harwell@DEShawResearch.com>

Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=3260#c3> on bug 3260<https://bugs.schedmd.com/show_bug.cgi?id=3260> from Alejandro Sanchez<mailto:alex@schedmd.com>

Chris - a few more questions and notes:

- How many jobs / job steps are you running per day?
- Can you run:

mysql> select count(*) from <cluster>_job_table where time_start >
unix_timestamp('2016-11-15');

- How large is the raw mysql database at this point?
- Are you running any sacct query asking for records since long time ago? Heavy
queries like that could go over the response buffer (4GB):

error: packdouble: Buffer size limit exceeded (4294901818 > 4294901760)

- I believe increasing the Purge frequency might help mitigate the issue too.
- innodb_buffer_pool_size can have a huge impact - we'd recommend setting this
as high as half the RAM available on the slurmdbd server. You can check the
current setting in MySQL like so:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

- Setting innodb_lock_wait_timeout and innodb_log_file_size to larger values
than the default is also recommended.

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 5 Alejandro Sanchez 2016-11-16 06:36:27 MST

(In reply to Chris from comment #4)
> Thanks! I will get back to you with that info.
> 
> By purge frequency increase do you mean shorter job and step retention times?

Exactly.

Comment 6 Chris 2016-11-16 06:55:24 MST

MariaDB [slurmcl2db]> SELECT table_schema table_schema, table_name table_name, engine, ROUND(data_length/1024/1024,2) total_size_mb, table_rows FROM information_schema.tables WHERE table_schema = "slurmcl2db" ORDER BY total_size_mb DESC LIMIT 10;
+--------------+--------------------------+--------+---------------+------------+
| table_schema | table_name               | engine | total_size_mb | table_rows |
+--------------+--------------------------+--------+---------------+------------+
| slurmcl2db   | cl_job_table             | InnoDB |       5766.02 |   13298060 |
| slurmcl2db   | cl_step_table            | InnoDB |       5733.02 |   10528556 |
| slurmcl2db   | qm_step_table            | InnoDB |        153.02 |     319390 |
| slurmcl2db   | swdev_job_table          | InnoDB |        149.17 |     360027 |
| slurmcl2db   | qm_job_table             | InnoDB |        113.16 |     274370 |
| slurmcl2db   | swdev_step_table         | InnoDB |         89.13 |     235229 |
| slurmcl2db   | cl_event_table           | InnoDB |         62.23 |     136641 |
| slurmcl2db   | a_assoc_usage_hour_table | InnoDB |         47.59 |     841910 |
| slurmcl2db   | a_step_table             | InnoDB |         21.52 |      12352 |
| slurmcl2db   | a_job_table              | InnoDB |         13.02 |       1572 |
+--------------+--------------------------+--------+---------------+------------+
10 rows in set (0.04 sec)

MariaDB [slurmcl2db]> select count(job_db_inx) from cl_job_table;
+-------------------+
| count(job_db_inx) |
+-------------------+
|          14292402 |
+-------------------+
1 row in set (1.73 sec)

MariaDB [slurmcl2db]> select count(job_db_inx) from cl_step_table;
+-------------------+
| count(job_db_inx) |
+-------------------+
|          10904677 |
+-------------------+
1 row in set (3.08 sec)

MariaDB [slurmcl2db]> select count(*) from cl_job_table where time_start > unix_timestamp('2016-11-15') and time_start < unix_timestamp('2016-11-16');
+----------+
| count(*) |
+----------+
|   445486 |
+----------+
1 row in set (2.06 sec)

MariaDB [slurmcl2db]> select count(*) from cl_job_table where time_start > unix_timestamp('2016-11-14') and time_start < unix_timestamp('2016-11-15');
+----------+
| count(*) |
+----------+
|   740696 |
+----------+
1 row in set (2.00 sec)

MariaDB [slurmcl2db]> select count(*) from cl_job_table where time_start > unix_timestamp('2016-11-13') and time_start < unix_timestamp('2016-11-14');
+----------+
| count(*) |
+----------+
|   649306 |
+----------+
1 row in set (2.00 sec)

MariaDB [slurmcl2db]> SHOW VARIABLES LIKE 'innodb%timeout';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| innodb_lock_wait_timeout   | 900   |
| innodb_rollback_on_timeout | OFF   |
+----------------------------+-------+
2 rows in set (0.00 sec)

MariaDB [slurmcl2db]> SHOW VARIABLES LIKE 'innodb%size';
+---------------------------------+-------------+
| Variable_name                   | Value       |
+---------------------------------+-------------+
| innodb_additional_mem_pool_size | 8388608     |
| innodb_buffer_pool_size         | 17179869184 |
| innodb_ibuf_max_size            | 8589918208  |
| innodb_log_block_size           | 512         |
| innodb_log_buffer_size          | 8388608     |
| innodb_log_file_size            | 536870912   |
| innodb_max_bitmap_file_size     | 104857600   |
| innodb_merge_sort_block_size    | 1048576     |
| innodb_page_size                | 16384       |
| innodb_purge_batch_size         | 20          |
+---------------------------------+-------------+
10 rows in set (0.00 sec)

Comment 7 Alejandro Sanchez 2016-11-16 09:08:22 MST

Chris - can you please set innodb_buffer_pool_size to 50GB and innodb_buffer_pool_instances to 50 and see if this helps mitigating the issue? I think these values would fit better your 64GB box by reading the MariaDB doc[1][2]. Also regarding the disk space:

- What's the total/free/used space in /var? Was /var full when this happened?
- What's the output of `ulimit -a` for the user running slurmdbd (I think slurmcl2)?

Around 500K jobs go to the database daily. As I said maybe increasing the Purge frequency would help too. Please, let me know how it goes.

[1] https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_buffer_pool_size
[2] https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_buffer_pool_instances

Comment 8 Chris 2016-11-17 13:10:02 MST

Could you explain a little more about the 4GB buffer size and what those error messages with Buffer size limit exceeded mean? Is that a slurmdbd compiled in limit and if the data returned from sql exceeds it we start to leak memory?



[harwell@drdslurm0001 SPECS]$ sudo journalctl -u slurmcl15dbd --since today | grep -v DBD_INIT | grep -v Conn | grep error
Nov 17 10:27:28 drdslurm0001.en.desres.deshaw.com slurmdbd[27641]: error: pack32: Buffer size limit exceeded (4294901818 > 4294901760)
Nov 17 10:27:28 drdslurm0001.en.desres.deshaw.com slurmdbd[27641]: error: pack32: Buffer size limit exceeded (4294901818 > 4294901760)
Nov 17 11:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more allocated time than is possible (6865200 > 288000) for cluster a(80) from 2016-11-17T10:00:00 - 2016-11-17T11:00:00 tres 1
Nov 17 11:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more time than is possible (288000+288000+0)(576000) > 288000 for cluster a(80) from 2016-11-17T10:00:00 - 2016-11-17T11:00:00 tres 1
Nov 17 11:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: id_assoc 1071 doesn't have any tres
Nov 17 12:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more allocated time than is possible (6865200 > 288000) for cluster a(80) from 2016-11-17T11:00:00 - 2016-11-17T12:00:00 tres 1
Nov 17 12:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more time than is possible (288000+288000+0)(576000) > 288000 for cluster a(80) from 2016-11-17T11:00:00 - 2016-11-17T12:00:00 tres 1
Nov 17 12:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: id_assoc 1071 doesn't have any tres
Nov 17 13:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more allocated time than is possible (6865200 > 288000) for cluster a(80) from 2016-11-17T12:00:00 - 2016-11-17T13:00:00 tres 1
Nov 17 13:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: We have more time than is possible (288000+288000+0)(576000) > 288000 for cluster a(80) from 2016-11-17T12:00:00 - 2016-11-17T13:00:00 tres 1
Nov 17 13:00:00 drdslurm0001.en.desres.deshaw.com slurmdbd[24513]: error: id_assoc 1071 doesn't have any tres



Sorry for the delayed reply. I've made some changes and haven't seen the errors since about 10:30am when I cancelled the reporting script.

I've updated some of the metrics we run to use squeue rather than sacct and disabled the month until now reports which query with sreport and sacct.  We've been monitoring slurmdbd memory usage and restarted it twice when the usage reached 60 and 67% of memory on the box.

The slurmcl2 user which runs slurmdbd has unlimited core:

 sudo su --shell=/bin/bash - slurmcl2
Last login: Thu Nov 17 14:01:13 EST 2016 on pts/10
-bash-4.2$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 255917
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

I'm not 100% positive of the order of events, but yes /var did fill on that day and that is where the archives and logs and core files and mysql tmp files all go, but at least mysql db is on a different /var/lib/mysql partition.

I've since grown the /var partition and removed the partial core files:

[harwell@drdslurm0001 SPECS]$ df -hl /var/ /var/lib/mysql/
Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/data-var     96G  7.8G   89G   9% /var
/dev/mapper/data-mysql   32G   18G   15G  56% /var/lib/mysql

I've decreased retention times to 30 days, stepping 1 day at a time and doing the purges by hand with something like this:
y=720hours ;

for i in {60..45} ; do
    h=$(( 24*$i )) ;
    x=${h}hours ;
    echo /opt/slurmcl15/bin/sacctmgr -i archive dump Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend PurgeEventAfter=$x PurgeJobAfter=$x PurgeStepAfter=$y PurgeSuspendAfter=24hours ;
    sudo -u slurmcl2 /opt/slurmcl15/bin/sacctmgr -i archive dump Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend PurgeEventAfter=$x PurgeJobAfter=$x PurgeStepAfter=$y PurgeSuspendAfter=24hours ;
done

We did see an uptick in jobs per day in Nov.

Setting innodb buffer to 50MB makes me a little nervous since slurmdbd has been ballooning up to 60%.  I have doubled it to 32GB and increased the instances from the default of 1 in this version to 32:

Current database: slurmcl2db

+---------------------------------------+-------------+
| Variable_name                         | Value       |
+---------------------------------------+-------------+
| innodb_buffer_pool_instances          | 32          |
| innodb_buffer_pool_populate           | OFF         |
| innodb_buffer_pool_restore_at_startup | 0           |
| innodb_buffer_pool_shm_checksum       | ON          |
| innodb_buffer_pool_shm_key            | 0           |
| innodb_buffer_pool_size               | 34359738368 |
+---------------------------------------+-------------+

Comment 9 Alejandro Sanchez 2016-11-18 08:58:59 MST

(In reply to Chris from comment #8)
> Could you explain a little more about the 4GB buffer size and what those
> error messages with Buffer size limit exceeded mean? Is that a slurmdbd
> compiled in limit and if the data returned from sql exceeds it we start to
> leak memory?

There's this max buffer definition in the code:
src/common/pack.h:#define MAX_BUF_SIZE ((uint32_t) 0xffff0000)	/* avoid going over 32-bits */

When slurmdbd has to pack and transmit buffer information the max buffer size is defined to 4294901760.

> Sorry for the delayed reply. I've made some changes and haven't seen the
> errors since about 10:30am when I cancelled the reporting script.
> 
> I've updated some of the metrics we run to use squeue rather than sacct and
> disabled the month until now reports which query with sreport and sacct. 
> We've been monitoring slurmdbd memory usage and restarted it twice when the
> usage reached 60 and 67% of memory on the box.
> 
> The slurmcl2 user which runs slurmdbd has unlimited core:
> 
>  sudo su --shell=/bin/bash - slurmcl2
> Last login: Thu Nov 17 14:01:13 EST 2016 on pts/10
> -bash-4.2$ ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 255917
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 16384
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 4096
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited

Can you please set the 'core file size' for slurmcl2 to unlimited? This way if slurmdbd aborts again we can backtrace the generated core and figure out what's going on.
 
> I'm not 100% positive of the order of events, but yes /var did fill on that
> day and that is where the archives and logs and core files and mysql tmp
> files all go, but at least mysql db is on a different /var/lib/mysql
> partition.
> 
> I've since grown the /var partition and removed the partial core files:
> 
> [harwell@drdslurm0001 SPECS]$ df -hl /var/ /var/lib/mysql/
> Filesystem              Size  Used Avail Use% Mounted on
> /dev/mapper/data-var     96G  7.8G   89G   9% /var
> /dev/mapper/data-mysql   32G   18G   15G  56% /var/lib/mysql
> 
> I've decreased retention times to 30 days, stepping 1 day at a time and
> doing the purges by hand with something like this:
> y=720hours ;
> 
> for i in {60..45} ; do
>     h=$(( 24*$i )) ;
>     x=${h}hours ;
>     echo /opt/slurmcl15/bin/sacctmgr -i archive dump
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend PurgeEventAfter=$x
> PurgeJobAfter=$x PurgeStepAfter=$y PurgeSuspendAfter=24hours ;
>     sudo -u slurmcl2 /opt/slurmcl15/bin/sacctmgr -i archive dump
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend PurgeEventAfter=$x
> PurgeJobAfter=$x PurgeStepAfter=$y PurgeSuspendAfter=24hours ;
> done
> 
> We did see an uptick in jobs per day in Nov.
> 
> Setting innodb buffer to 50MB makes me a little nervous since slurmdbd has
> been ballooning up to 60%.  I have doubled it to 32GB and increased the
> instances from the default of 1 in this version to 32:
> 
> Current database: slurmcl2db
> 
> +---------------------------------------+-------------+
> | Variable_name                         | Value       |
> +---------------------------------------+-------------+
> | innodb_buffer_pool_instances          | 32          |
> | innodb_buffer_pool_populate           | OFF         |
> | innodb_buffer_pool_restore_at_startup | 0           |
> | innodb_buffer_pool_shm_checksum       | ON          |
> | innodb_buffer_pool_shm_key            | 0           |
> | innodb_buffer_pool_size               | 34359738368 |
> +---------------------------------------+-------------+

Ok, great. Are you experiencing more errors? If slurmdbd aborts() again could you please attach the output of:

$ gdb /path/to/core slurmdbd
(gdb) set pagination off
(gdb) set print pretty on
(gdb) bt
(gdb) thread apply all bt full

Thanks.

Comment 10 Chris 2016-11-18 09:10:57 MST

Sorry, I totally pasted the wrong thing. The limit when opening a login 
shell are different from those of the running slurmdbd process. For that 
the core file size is really "unlimited".  Since we've gotten two 
partial cores before I have a fairly high degree of confidence in that :>

[harwell@drdslurm0001 SPECS]$ cat /proc/$(!!)/limits
cat /proc/$(pgrep slurmdbd)/limits
Limit                     Soft Limit           Hard Limit Units
Max cpu time              unlimited            unlimited seconds
Max file size             unlimited            unlimited bytes
Max data size             unlimited            unlimited bytes
Max stack size            unlimited            unlimited bytes
Max core file size        unlimited            unlimited bytes
Max resident set          unlimited            unlimited bytes
Max processes             255917               255917 processes
Max open files            8192                 8192 files
Max locked memory         unlimited            unlimited bytes
Max address space         unlimited            unlimited bytes
Max file locks            unlimited            unlimited locks
Max pending signals       255917               255917 signals
Max msgqueue size         819200               819200 bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited us
[harwell@drdslurm0001 SPECS]$

On 11/18/2016 10:58 AM, bugs@schedmd.com wrote:
>
> *Comment # 9 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c9> on bug 
> 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from Alejandro 
> Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #8 <show_bug.cgi?id=3260#c8>)
> > Could you explain a little more about the 4GB buffer size and what those > error messages with Buffer size limit exceeded mean? Is that a 
> slurmdbd > compiled in limit and if the data returned from sql exceeds 
> it we start to > leak memory?
>
> There's this max buffer definition in the code:
> src/common/pack.h:#define MAX_BUF_SIZE ((uint32_t) 0xffff0000)  /* avoid going
> over 32-bits */
>
> When slurmdbd has to pack and transmit buffer information the max buffer size
> is defined to 4294901760.
>
> > Sorry for the delayed reply. I've made some changes and haven't seen the > errors since about 10:30am when I cancelled the reporting script. > 
> > I've updated some of the metrics we run to use squeue rather than 
> sacct and > disabled the month until now reports which query with 
> sreport and sacct. > We've been monitoring slurmdbd memory usage and 
> restarted it twice when the > usage reached 60 and 67% of memory on 
> the box. > > The slurmcl2 user which runs slurmdbd has unlimited core: 
> > > sudo su --shell=/bin/bash - slurmcl2 > Last login: Thu Nov 17 
> 14:01:13 EST 2016 on pts/10 > -bash-4.2$ ulimit -a > core file size 
> (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling 
> priority (-e) 0 > file size (blocks, -f) unlimited > pending signals 
> (-i) 255917 > max locked memory (kbytes, -l) unlimited > max memory 
> size (kbytes, -m) unlimited > open files (-n) 16384 > pipe size (512 
> bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time 
> priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, 
> -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, 
> -v) unlimited > file locks (-x) unlimited
>
> Can you please set the 'core file size' for slurmcl2 to unlimited? This way if
> slurmdbd aborts again we can backtrace the generated core and figure out what's
> going on.
>
> > I'm not 100% positive of the order of events, but yes /var did fill on that > day and that is where the archives and logs and core files and 
> mysql tmp > files all go, but at least mysql db is on a different 
> /var/lib/mysql > partition. > > I've since grown the /var partition 
> and removed the partial core files: > > [harwell@drdslurm0001 SPECS]$ 
> df -hl /var/ /var/lib/mysql/ > Filesystem Size Used Avail Use% Mounted 
> on > /dev/mapper/data-var 96G 7.8G 89G 9% /var > 
> /dev/mapper/data-mysql 32G 18G 15G 56% /var/lib/mysql > > I've 
> decreased retention times to 30 days, stepping 1 day at a time and > 
> doing the purges by hand with something like this: > y=720hours ; > > 
> for i in {60..45} ; do > h=$(( 24*$i )) ; > x=${h}hours ; > echo 
> /opt/slurmcl15/bin/sacctmgr -i archive dump > 
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend 
> PurgeEventAfter=$x > PurgeJobAfter=$x PurgeStepAfter=$y 
> PurgeSuspendAfter=24hours ; > sudo -u slurmcl2 
> /opt/slurmcl15/bin/sacctmgr -i archive dump > 
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend 
> PurgeEventAfter=$x > PurgeJobAfter=$x PurgeStepAfter=$y 
> PurgeSuspendAfter=24hours ; > done > > We did see an uptick in jobs 
> per day in Nov. > > Setting innodb buffer to 50MB makes me a little 
> nervous since slurmdbd has > been ballooning up to 60%. I have doubled 
> it to 32GB and increased the > instances from the default of 1 in this 
> version to 32: > > Current database: slurmcl2db > > 
> +---------------------------------------+-------------+ > | 
> Variable_name | Value | > 
> +---------------------------------------+-------------+ > | 
> innodb_buffer_pool_instances | 32 | > | innodb_buffer_pool_populate | 
> OFF | > | innodb_buffer_pool_restore_at_startup | 0 | > | 
> innodb_buffer_pool_shm_checksum | ON | > | innodb_buffer_pool_shm_key 
> | 0 | > | innodb_buffer_pool_size | 34359738368 | > 
> +---------------------------------------+-------------+
>
> Ok, great. Are you experiencing more errors? If slurmdbd aborts() again could
> you please attach the output of:
>
> $ gdb /path/to/core slurmdbd
> (gdb) set pagination off
> (gdb) set print pretty on
> (gdb) bt
> (gdb) thread apply all bt full
>
> Thanks.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 11 Chris 2016-11-18 09:17:23 MST

So far so good in general though, slurmdbd is at 12%mem at 7.394g 
resident after 20hrs.  Hopefully the cleanup and the tuning helped. Is 
there a scenario where if mysqld isn't supplying the answer to the 
slurmdbd (sacct/sreport) query quickly enough that the slurmdbd memory 
footprint would be expected to get bigger?


On 11/18/2016 10:58 AM, bugs@schedmd.com wrote:
>
> *Comment # 9 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c9> on bug 
> 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from Alejandro 
> Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #8 <show_bug.cgi?id=3260#c8>)
> > Could you explain a little more about the 4GB buffer size and what those > error messages with Buffer size limit exceeded mean? Is that a 
> slurmdbd > compiled in limit and if the data returned from sql exceeds 
> it we start to > leak memory?
>
> There's this max buffer definition in the code:
> src/common/pack.h:#define MAX_BUF_SIZE ((uint32_t) 0xffff0000)  /* avoid going
> over 32-bits */
>
> When slurmdbd has to pack and transmit buffer information the max buffer size
> is defined to 4294901760.
>
> > Sorry for the delayed reply. I've made some changes and haven't seen the > errors since about 10:30am when I cancelled the reporting script. > 
> > I've updated some of the metrics we run to use squeue rather than 
> sacct and > disabled the month until now reports which query with 
> sreport and sacct. > We've been monitoring slurmdbd memory usage and 
> restarted it twice when the > usage reached 60 and 67% of memory on 
> the box. > > The slurmcl2 user which runs slurmdbd has unlimited core: 
> > > sudo su --shell=/bin/bash - slurmcl2 > Last login: Thu Nov 17 
> 14:01:13 EST 2016 on pts/10 > -bash-4.2$ ulimit -a > core file size 
> (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling 
> priority (-e) 0 > file size (blocks, -f) unlimited > pending signals 
> (-i) 255917 > max locked memory (kbytes, -l) unlimited > max memory 
> size (kbytes, -m) unlimited > open files (-n) 16384 > pipe size (512 
> bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time 
> priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, 
> -t) unlimited > max user processes (-u) 4096 > virtual memory (kbytes, 
> -v) unlimited > file locks (-x) unlimited
>
> Can you please set the 'core file size' for slurmcl2 to unlimited? This way if
> slurmdbd aborts again we can backtrace the generated core and figure out what's
> going on.
>
> > I'm not 100% positive of the order of events, but yes /var did fill on that > day and that is where the archives and logs and core files and 
> mysql tmp > files all go, but at least mysql db is on a different 
> /var/lib/mysql > partition. > > I've since grown the /var partition 
> and removed the partial core files: > > [harwell@drdslurm0001 SPECS]$ 
> df -hl /var/ /var/lib/mysql/ > Filesystem Size Used Avail Use% Mounted 
> on > /dev/mapper/data-var 96G 7.8G 89G 9% /var > 
> /dev/mapper/data-mysql 32G 18G 15G 56% /var/lib/mysql > > I've 
> decreased retention times to 30 days, stepping 1 day at a time and > 
> doing the purges by hand with something like this: > y=720hours ; > > 
> for i in {60..45} ; do > h=$(( 24*$i )) ; > x=${h}hours ; > echo 
> /opt/slurmcl15/bin/sacctmgr -i archive dump > 
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend 
> PurgeEventAfter=$x > PurgeJobAfter=$x PurgeStepAfter=$y 
> PurgeSuspendAfter=24hours ; > sudo -u slurmcl2 
> /opt/slurmcl15/bin/sacctmgr -i archive dump > 
> Directory=/var/spool/slurmcl15/ Events Jobs Steps Suspend 
> PurgeEventAfter=$x > PurgeJobAfter=$x PurgeStepAfter=$y 
> PurgeSuspendAfter=24hours ; > done > > We did see an uptick in jobs 
> per day in Nov. > > Setting innodb buffer to 50MB makes me a little 
> nervous since slurmdbd has > been ballooning up to 60%. I have doubled 
> it to 32GB and increased the > instances from the default of 1 in this 
> version to 32: > > Current database: slurmcl2db > > 
> +---------------------------------------+-------------+ > | 
> Variable_name | Value | > 
> +---------------------------------------+-------------+ > | 
> innodb_buffer_pool_instances | 32 | > | innodb_buffer_pool_populate | 
> OFF | > | innodb_buffer_pool_restore_at_startup | 0 | > | 
> innodb_buffer_pool_shm_checksum | ON | > | innodb_buffer_pool_shm_key 
> | 0 | > | innodb_buffer_pool_size | 34359738368 | > 
> +---------------------------------------+-------------+
>
> Ok, great. Are you experiencing more errors? If slurmdbd aborts() again could
> you please attach the output of:
>
> $ gdb /path/to/core slurmdbd
> (gdb) set pagination off
> (gdb) set print pretty on
> (gdb) bt
> (gdb) thread apply all bt full
>
> Thanks.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 12 Alejandro Sanchez 2016-11-18 09:29:39 MST

(In reply to Chris from comment #11)
> So far so good in general though, slurmdbd is at 12%mem at 7.394g 
> resident after 20hrs.  Hopefully the cleanup and the tuning helped.

Great. Let's leave the bug open for a while and slurmdbd under observation for a few days just in case any new issue happens.

> Is there a scenario where if mysqld isn't supplying the answer to the 
> slurmdbd (sacct/sreport) query quickly enough that the slurmdbd memory 
> footprint would be expected to get bigger?

If mysqld isn't supplying a result to slurmdbd I would not expect slurmdbd memory footprint to increase. I would if the query result is large such as sacct/sreport queries with large date ranges.

Comment 13 Alejandro Sanchez 2016-11-22 03:38:12 MST

Chris, any more memory footprints, issues?

Comment 14 Chris 2016-11-22 10:25:35 MST

No crashes yet :>

We've seen the memory go to ~50% and it is sitting at 40% now. Perhaps this is just the new norm with the number of jobs we have?

Last started Thu 2016-11-17 14:30:26 EST; 4 days ago

We have issued reconfigure a few times, would that be expected to clear memory?

top - 12:17:35 up 258 days, 18:44,  1 user,  load average: 0.06, 0.09, 0.13
Tasks: 401 total,   2 running, 399 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.1 sy,  0.0 ni, 99.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65536736 total,  3657260 free, 58970704 used,  2908772 buff/cache
KiB Swap: 67108860 total, 49677032 free, 17431828 used.  6177936 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                 
20307 mysql     20   0 36.387g 0.028t   4720 S   1.6 46.5 846:21.45 mysqld                                                                                                                   
20364 slurmcl2   2 -18 27.023g 0.025t   1112 S   0.5 40.2 140:34.32 slurmdbd   

We do see warnings pretty much every hour about large processing times for hourly_rollups.  Is any action required or suggested there? 

Nov 21 00:02:11 drdslurm0001.en.desres.deshaw.com slurmdbd[20364]: Warning: Note very large processing time from hourly_rollup for cl: usec=130528211 began=00:00:00.724

Comment 15 Alejandro Sanchez 2016-11-23 03:59:36 MST

(In reply to Chris from comment #14)
> No crashes yet :>

Good.

> We've seen the memory go to ~50% and it is sitting at 40% now. Perhaps this
> is just the new norm with the number of jobs we have?

Obviously the more pressure querying slurmdbd the more memory it's gonna consume, but 32GB of memory seems ugly.

> We have issued reconfigure a few times, would that be expected to clear
> memory?

Looking at what reconfig code does, don't think so. It just re-reads the configuration, sets missing uids, resets the logging... but it doesn't look it clears memory at all.

> We do see warnings pretty much every hour about large processing times for
> hourly_rollups.  Is any action required or suggested there?

I see in the logs (from just Nov 15th) several connections from these UID's:

  CONNS UID's  
   1944 UID:0
    244 UID:11580
   4418 UID:12474
      2 UID:12858
    183 UID:13013
   1524 UID:13120
      1 UID:13664
   2724 UID:23532
   3683 UID:40043
   5114 UID:406

- Do you know if any of these uid's are constantly querying slurmdbd?
- How often are sacct/sreport requesting info to slurmdbd? Having a script/cron that constantly calls sacct/sreport can also cause the slurmdbd to slow down, which is why I ask.

Comment 16 Chris 2016-11-23 07:15:28 MST

Yes, we have several workflows and health checks which submit some jobs 
and then query sacct for completion and exit status.

Is there a mechanism for telling slurmdbd to drop or expire cache sooner 
without restarting?

On 11/23/2016 05:59 AM, bugs@schedmd.com wrote:
>
> *Comment # 15 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c15> on 
> bug 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from 
> Alejandro Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #14 <show_bug.cgi?id=3260#c14>)
> > No crashes yet :>
>
> Good.
>
> > We've seen the memory go to ~50% and it is sitting at 40% now. Perhaps this > is just the new norm with the number of jobs we have?
>
> Obviously the more pressure querying slurmdbd the more memory it's gonna
> consume, but 32GB of memory seems ugly.
>
> > We have issued reconfigure a few times, would that be expected to clear > memory?
>
> Looking at what reconfig code does, don't think so. It just re-reads the
> configuration, sets missing uids, resets the logging... but it doesn't look it
> clears memory at all.
>
> > We do see warnings pretty much every hour about large processing times for > hourly_rollups. Is any action required or suggested there?
>
> I see in the logs (from just Nov 15th) several connections from these UID's:
>
>    CONNS UID's
>     1944 UID:0
>      244 UID:11580
>     4418 UID:12474
>        2 UID:12858
>      183 UID:13013
>     1524 UID:13120
>        1 UID:13664
>     2724 UID:23532
>     3683 UID:40043
>     5114 UID:406
>
> - Do you know if any of these uid's are constantly querying slurmdbd?
> - How often are sacct/sreport requesting info to slurmdbd? Having a script/cron
> that constantly calls sacct/sreport can also cause the slurmdbd to slow down,
> which is why I ask.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 17 Alejandro Sanchez 2016-11-23 07:43:14 MST

(In reply to Chris from comment #16)
> Yes, we have several workflows and health checks which submit some jobs 
> and then query sacct for completion and exit status.

Then probably all these workflows are making a lot of pressure on slurmdbd, depending on the frequency of the requests and the size of the results.

> Is there a mechanism for telling slurmdbd to drop or expire cache sooner 
> without restarting?

Can you please elaborate this a little bit more?

Comment 18 Chris 2016-11-23 08:09:35 MST

Oh, I just found myself wondering what slurmdbd is doing with all that 
memory. I figure it can't all be uncommitted new information, but is 
probably also some amount of old job information that it keeps to answer 
repeated queries rather than relying on mysql. _if_ that is the case I 
thought maybe it there was some way to tell it to forget about those and 
go to the db next time.

So .. are we doing it differently than other people? Should we not be 
querying with sacct?  Should we instead be using a job completion plugin 
of somekind? What is the recommendation?  (Or maybe we should just go 
buy something with more mem and move on....)

On 11/23/2016 09:43 AM, bugs@schedmd.com wrote:
>
> *Comment # 17 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c17> on 
> bug 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from 
> Alejandro Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #16 <show_bug.cgi?id=3260#c16>)
> > Yes, we have several workflows and health checks which submit some jobs> and then query sacct for completion and exit status.
>
> Then probably all these workflows are making a lot of pressure on slurmdbd,
> depending on the frequency of the requests and the size of the results.
>
> > Is there a mechanism for telling slurmdbd to drop or expire cache sooner> without restarting?
>
> Can you please elaborate this a little bit more?
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 19 Alejandro Sanchez 2016-11-23 08:29:11 MST

(In reply to Chris from comment #18)
> Oh, I just found myself wondering what slurmdbd is doing with all that 
> memory. I figure it can't all be uncommitted new information, but is 
> probably also some amount of old job information that it keeps to answer 
> repeated queries rather than relying on mysql. _if_ that is the case I 
> thought maybe it there was some way to tell it to forget about those and 
> go to the db next time.
> 
> So .. are we doing it differently than other people? Should we not be 
> querying with sacct?  Should we instead be using a job completion plugin 
> of somekind? What is the recommendation?  (Or maybe we should just go 
> buy something with more mem and move on....)

Well slurmdbd retains the results of all these queries in memory until it buffers them out to the clients or to slurmctld. So not all memory is uncommitted new information indeed. That's why depending on the sacct query frequency and size of the query results the slurmdbd memory footprint can increase.

Can you please identify the list of exact sacct queries being executed against slurmdbd and how often are they executed? For instance having sacct querying for job records starting from long time ago with a high frequency is an example of this. I'm not saying that you should not be querying with sacct, just that depending on frequency and size it may have a memory impact, that's why it's important to precisely identify the kind of queries being executed.

Using a job completion plugin to store/retrieve the information related to finished jobs may help decouple a bit slurmdbd from your workflow of querying the status of these finished jobs.

Comment 20 Chris 2016-11-23 08:42:01 MST

I will need to gather that info. I know in some cases it is just sacct 
-j $jids -o JobId,State,Exit and in other cases just 'sacct' with no 
arguments.  In some cases it uses --brief and/or --starttime=$(d
ate +%H:%M --date='1 hour ago').

Does slurmdbd have a debug flag or level which will log the sacct 
queuries or should I wrap sacct to log somewhere and report back?


On 11/23/2016 10:29 AM, bugs@schedmd.com wrote:
>
> *Comment # 19 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c19> on 
> bug 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from 
> Alejandro Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #18 <show_bug.cgi?id=3260#c18>)
> > Oh, I just found myself wondering what slurmdbd is doing with all that> memory. I figure it can't all be uncommitted new information, but is 
> > probably also some amount of old job information that it keeps to 
> answer > repeated queries rather than relying on mysql. _if_ that is 
> the case I > thought maybe it there was some way to tell it to forget 
> about those and > go to the db next time. > > So .. are we doing it 
> differently than other people? Should we not be > querying with sacct? 
> Should we instead be using a job completion plugin > of somekind? What 
> is the recommendation? (Or maybe we should just go > buy something 
> with more mem and move on....)
>
> Well slurmdbd retains the results of all these queries in memory until it
> buffers them out to the clients or to slurmctld. So not all memory is
> uncommitted new information indeed. That's why depending on the sacct query
> frequency and size of the query results the slurmdbd memory footprint can
> increase.
>
> Can you please identify the list of exact sacct queries being executed against
> slurmdbd and how often are they executed? For instance having sacct querying
> for job records starting from long time ago with a high frequency is an example
> of this. I'm not saying that you should not be querying with sacct, just that
> depending on frequency and size it may have a memory impact, that's why it's
> important to precisely identify the kind of queries being executed.
>
> Using a job completion plugin to store/retrieve the information related to
> finished jobs may help decouple a bit slurmdbd from your workflow of querying
> the status of these finished jobs.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 21 Alejandro Sanchez 2016-11-23 08:59:19 MST

(In reply to Chris from comment #20)
> I will need to gather that info. I know in some cases it is just sacct 
> -j $jids -o JobId,State,Exit and in other cases just 'sacct' with no 
> arguments.  In some cases it uses --brief and/or --starttime=$(d
> ate +%H:%M --date='1 hour ago').

If you can identify precisely these exact commands that may be helpful.
 
> Does slurmdbd have a debug flag or level which will log the sacct 
> queuries or should I wrap sacct to log somewhere and report back?

There's 'sacctmgr show transaction', but these will only list additions, modifications and removals, not queries.

You can use the slurmdbd.conf DebugFlags DB_JOB, DB_QUERY, DB_STEP which will result in more verbose logging on SQL statements/queries when dealing with jobs, transactions and steps respectively. Anyhow, I'd suggest looking at history of commands of the UID's in comment #15 or check the scripts/crons that generate such queries. Then post them here in a new comment and also may be useful manually executing some of these and see if slurmdbd memory consumption changes while they are being executed.

Comment 22 Chris 2016-11-23 09:09:25 MST

Here is a five minute slice from some:
Nov 23 11:00:53 sacct:
Nov 23 11:00:53 sacct:
Nov 23 11:01:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:01:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:01:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:01:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:01:13 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:01:13 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:01:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:01:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:01:29 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:01:29 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:01:37 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:01:37 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:02:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:02:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:02:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:02:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:02:13 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:02:13 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:02:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:02:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:02:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:02:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:02:37 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:02:37 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:03:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:03:05 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:03:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:03:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:03:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:03:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:03:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:03:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:03:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:03:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:03:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:03:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:04:06 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:04:06 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:04:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:04:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:04:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:04:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:04:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:04:23 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:04:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:04:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:04:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:04:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:05:02 sacct: --brief --starttime=10:05
Nov 23 11:05:02 sacct: --brief --starttime=10:05
Nov 23 11:05:02 sacct: --brief --starttime=10:05
Nov 23 11:05:02 sacct: --brief --starttime=10:05
Nov 23 11:05:06 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:05:06 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26015188,26015189
Nov 23 11:05:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:05:11 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25811768
Nov 23 11:05:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:05:14 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25961228,26042983
Nov 23 11:05:24 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:05:24 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25784701
Nov 23 11:05:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:05:30 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 25969744,25992646
Nov 23 11:05:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823
Nov 23 11:05:38 sacct: --parsable2 --noheader 
--format=JobID,State,ExitCode --jobs 26035833,26009823

On 11/23/2016 10:59 AM, bugs@schedmd.com wrote:
>
> *Comment # 21 <https://bugs.schedmd.com/show_bug.cgi?id=3260#c21> on 
> bug 3260 <https://bugs.schedmd.com/show_bug.cgi?id=3260> from 
> Alejandro Sanchez <mailto:alex@schedmd.com> *
> (In reply to Chris fromcomment #20 <show_bug.cgi?id=3260#c20>)
> > I will need to gather that info. I know in some cases it is just sacct> -j $jids -o JobId,State,Exit and in other cases just 'sacct' with no 
> > arguments. In some cases it uses --brief and/or --starttime=$(d > 
> ate +%H:%M --date='1 hour ago').
>
> If you can identify precisely these exact commands that may be helpful.
>
> > Does slurmdbd have a debug flag or level which will log the sacct> queuries or should I wrap sacct to log somewhere and report back?
>
> There's 'sacctmgr show transaction', but these will only list additions,
> modifications and removals, not queries.
>
> You can use the slurmdbd.conf DebugFlags DB_JOB, DB_QUERY, DB_STEP which will
> result in more verbose logging on SQL statements/queries when dealing with
> jobs, transactions and steps respectively. Anyhow, I'd suggest looking at
> history of commands of the UID's incomment #15 <show_bug.cgi?id=3260#c15>  or check the scripts/crons that
> generate such queries. Then post them here in a new comment and also may be
> useful manually executing some of these and see if slurmdbd memory consumption
> changes while they are being executed.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>

Comment 26 Alejandro Sanchez 2016-11-23 09:48:38 MST

Chris - Could you please recompile the code with --enable-memory-leak-debug option? we do create lists for each transaction which sort of caches the memory for the list to be reused later. This should be fairly small, but might be related. This option will not keep the pointers around. You would only need to do this for the slurmdbd node.

Another thing you can do to help us diagnose this issue is to have a cron job which runs every minute or so looking at the memory footprint and see at what time the memory grows. Our guess is there is no leak but perhaps something in the mysql API is holding onto memory.

Comment 27 Chris 2016-12-02 10:35:13 MST

[root@drdslurm0001 bin]# systemctl status -l slurmcl15dbd
● slurmcl15dbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmcl15dbd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/slurmcl15dbd.service.d
           └─10-options.conf
   Active: active (running) since Thu 2016-11-17 14:30:26 EST; 2 weeks 0 days ago
 Main PID: 20364 (slurmdbd)
   CGroup: /system.slice/slurmcl15dbd.service
           └─20364 /opt/slurmcl15/sbin/slurmdbd -n -18

From top
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                             20364 slurmcl2   2 -18 41.410g 0.018t   1472 S   0.3 30.0 490:02.16 slurmdbd                                                                          

Recompiled with --enable-memory-leak-debug and restarted with systemd restart ....  

Would that print something in the logs normally, or does this need to started with valgrind or something?

Comment 28 Alejandro Sanchez 2016-12-02 10:43:50 MST

(In reply to Chris from comment #27)
> Recompiled with --enable-memory-leak-debug and restarted with systemd
> restart ....  

Good, thanks.

> Would that print something in the logs normally, or does this need to
> started with valgrind or something?

No you won't notice anything special in the logs. We do create lists for each transaction which sort of caches the memory for the list to be reused later. It should be fairly small, but might be related to the bug. When MEMORY_LEAK_DEBUG is set to 1, the cache is disabled. Each memory request will be satisified with a separate xmalloc request. When the memory is no longer required, it is immediately freed. This means valgrind can identify where exactly any leak associated with the use of the list functions originates.

Another thing would be have a cron job run every minute or so looking at the memory footprint and see at what time the memory grows. Our guess is there is no leak but perhaps something in the mysql API is holding onto memory.  From what you sent it appeared the mysqld had a very similar footprint. With the above option at least we will file out the list cache.

Comment 29 Alejandro Sanchez 2016-12-07 07:05:34 MST

Chris, have you managed to monitor the footprint with any cron job or similar? any updates on this?

Comment 30 Alejandro Sanchez 2016-12-28 03:59:41 MST

Haven't seen feedback since three weeks ago; marking this as resolved/timedout. Please reopen if this is still an active concern.

Comment 31 Chris 2016-12-28 06:23:03 MST

Interestingly we stopped seeing the rapid memory growth once we installed the version compiled with the suggested configure option.

-------- Original Message --------

Subject: [Bug 3260] slurmdbd out of memory, dumping core

From: bugs@schedmd.com

Date: Dec 28, 2016, 06:01

To: "Harwell, Chris" <Chris.Harwell@DEShawResearch.com>

Alejandro Sanchez<mailto:alex@schedmd.com> changed bug 3260<https://bugs.schedmd.com/show_bug.cgi?id=3260>
What    Removed Added
Resolution      ---     TIMEDOUT
Status  UNCONFIRMED     RESOLVED

Comment # 30<https://bugs.schedmd.com/show_bug.cgi?id=3260#c30> on bug 3260<https://bugs.schedmd.com/show_bug.cgi?id=3260> from Alejandro Sanchez<mailto:alex@schedmd.com>

Haven't seen feedback since three weeks ago; marking this as resolved/timedout.
Please reopen if this is still an active concern.

________________________________
You are receiving this mail because:

  *   You reported the bug.

Comment 32 Alejandro Sanchez 2016-12-28 07:06:06 MST

Great, it seems disabling the cache worked well then. Marking as resolved/infogiven.