| Summary: | Archiving and purging old jobs. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | lhuang |
| Component: | Database | Assignee: | Albert Gil <albert.gil> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cblack |
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NY Genome | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
lhuang
2022-09-09 11:40:08 MDT
Hi Luis, > We just started archiving/purging records from a testing database that > contains 27 million job records. Initially we tested purging around 2 months > of records from jobs that completed in 2019. This took around 1-2 hours to > complete. Looks like archiving / purging works. > > I've now increased the archive/purge to any jobs older than December 2020. > I'm unsure how many records there are but it's 1 year worth of jobs. It's > been running for close to 24 hours but it's slowly chugging along. I can see > that it's completed around 6 months of job records. I think that you are doing a great job be doing the archiving/purguing incrementally and monitoring it close! > Is there anything we can do to speed it up? This is our innodb settings. > > [root@dev-slurm01 ~]# cat /etc/my.cnf.d/innodb.cnf > [mysqld] > innodb_buffer_pool_size=2048M > innodb_log_file_size=64M > innodb_lock_wait_timeout=900 We actually recommend innodb_buffer_pool_size=4096M. See https://slurm.schedmd.com/accounting.html#slurm-accounting-configuration-before-build > I'm also a little concerned as we will need to do this from the production > cluster soon. From my testing, it does look like we can still continue to > use the slurm cluster including any sacct commands. Do you foresee any > issues while we archive/purge the records? No, you shouldn't face any issue, but some notes to take into account: - Keep doing it incrementally, so you have better control/monitorization - Keep an eye on possible runaway jobs to avoid them being to close to purging time - Note that fixing very old runaways trigger an internal rollup operation that will take more time the older the runaways is - Similary, keep a purge time great enough to avoid trying to purge jobs already in the system And finally, although you should be able to restore the archived records into a newer DB/slurmdbd for future inspection, I would also recommend you to also keep a SQL backup. Regards, Albert Hi Albert,
Looks like it completed successfully and thank you for the tips and suggestions. Although we can size the db has shrunk in size. The ibdata1 file did not decrease. Is it suppose to reduce in size?
MariaDB [(none)]> SELECT table_schema "DB Name",
-> ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB"
-> FROM information_schema.tables
-> GROUP BY table_schema;
+--------------------+---------------+
| DB Name | DB Size in MB |
+--------------------+---------------+
| information_schema | 0.1 |
| mysql | 0.6 |
| performance_schema | 0.0 |
| slurm_acct_db | 30676.8 |
+--------------------+---------------+
[root@dev-slurm01 ~]# du -skh /var/lib/mysql/ibdata1
49G /var/lib/mysql/ibdata1
Regards,
Luis
Hi Luis, > Looks like it completed successfully Great! > Although we can size the db has shrunk in size. The ibdata1 > file did not decrease. Is it suppose to reduce in size? Well, this is something in the realm of MariaDB. AFAIK internally MariaDB does a rationale like "ok, now the DB has less records but it had more in the past, so lets keep the disk space because most probably we'll need it soon and access to it will be faster if the file is already that big". There are someways to reduce the disk usage, though, but I won't recommend any. Regards, Albert Hi Luis, If this is ok for you I'm closing this ticket as infogiven, but please don't hesitate to reopen it if you need further related support. Regards, Albert |