Ticket 8910 - steps for loading archives in a separate database
Summary: steps for loading archives in a separate database
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Database (show other tickets)
Version: 19.05.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact: Ben Roberts
URL:
Depends on:
Blocks:
 
Reported: 2020-04-21 09:33 MDT by Sophie Créno
Modified: 2022-11-03 12:08 MDT (History)
4 users (show)

See Also:
Site: Institut Pasteur
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.02.4 20.11.0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Sophie Créno 2020-04-21 09:33:19 MDT
Hello,

  We would like to load our archives (job_table_archive, step_table_archive...)
in a database separated from the production one so that we can keep more
than 3 months in it (our current limit given the number of jobs per month)
and can compute statistics about the cluster usage (by research unit, by type
of QoS or GRes...) over a year for example. At the moment, we are unable
to extract this kind of figures easily so it's difficult to foresee where
we should put resources, which quantity and at which pace.
  How should we proceed? I guess we should export the users/associations
data from the production DB using s"acctmgr dump" and load the result file
in the blank DB with "sacctmgr load" before using "sacctmgr archive load"
to insert the different archive files, one by one according to the date
of the content.
  Will it be find if we insert jobs belonging to users that have been
deleted after the archive file creation and that won't be, as a consequence,
in the blank database populated with the result of the "sacctmgr dump"
command?

  We also wonder how we could create archives more frequently without
purging the production DB. It seems to be possible given the sentence 
"""
Archive dump
Dump accounting data to file. Depending on options and slurmdbd configuration
data may remain in database or be purged.
"""

Should we use a script for that, using the SLURM_ARCHIVE_LAST_JOB to give
the date of the most recent job to dump in the archive? Could you provide
a short example/template for that?

Thanks in advance,
Comment 1 Nate Rini 2020-04-21 14:06:40 MDT
Sophie.

(In reply to Sophie Créno from comment #0)
>   We would like to load our archives (job_table_archive,
> step_table_archive...)
> in a database separated from the production one so that we can keep more
> than 3 months in it (our current limit given the number of jobs per month)
> and can compute statistics about the cluster usage (by research unit, by type
> of QoS or GRes...) over a year for example. 

I'm going to lower the severity of this ticket per <https://www.schedmd.com/support.php> 
>Severity 4 issues may also be used for recommendations for future product enhancements or modifications.

> At the moment, we are unable
> to extract this kind of figures easily so it's difficult to foresee where
> we should put resources, which quantity and at which pace.
Understood, many sites use previous machines to predict needs of new machines.

>   How should we proceed?
Do you plan to use another slurmdbd/slurmctld pair to store and query the records or will you be using MySQL/Mariadb directly?
Comment 2 Sophie Créno 2020-04-22 01:42:58 MDT
Hello Nate,

  Thanks for your quick reply.


> I'm going to lower the severity of this ticket per 
> <https://www.schedmd.com/support.php> 
> >Severity 4 issues may also be used for recommendations for future product 
> enhancements or modifications.

  Oh, yes, sorry, I should have reread it first.


> Do you plan to use another slurmdbd/slurmctld pair to store and query
> the records or will you be using MySQL/Mariadb directly?

  Yes, we imagined to use another slurmdbd/slurmctld pair to store and
query. But we will also probably build our own SQL queries to cross data
and obtain all we need given our organization (common and private partitions...).
Comment 3 Nate Rini 2020-04-23 10:22:00 MDT
(In reply to Sophie Créno from comment #2)
>   Yes, we imagined to use another slurmdbd/slurmctld pair to store and
> query.

Looking in how best to do this.

> But we will also probably build our own SQL queries to cross data
> and obtain all we need given our organization (common and private
> partitions...).

Just so you know, the SQL table schemas can change between major releases.
Comment 4 Sophie Créno 2020-04-29 07:01:20 MDT
>> But we will also probably build our own SQL queries to cross data
>> and obtain all we need given our organization (common and private
>> partitions...).
>
> Just so you know, the SQL table schemas can change between major releases.

  Yes, I know. We will have to update them at the same time. That would be
part of our check list when we install a new version on the test cluster.
Comment 5 Sophie Créno 2020-05-11 10:33:08 MDT
Hello,

  This message just to know if you found time during the last couple of weeks
to look into this.

  Thanks in advance,
Comment 7 Marshall Garey 2020-05-15 16:17:15 MDT
>   We would like to load our archives (job_table_archive,
> step_table_archive...)
> in a database separated from the production one so that we can keep more
> than 3 months in it (our current limit given the number of jobs per month)
> and can compute statistics about the cluster usage (by research unit, by type
> of QoS or GRes...) over a year for example. At the moment, we are unable
> to extract this kind of figures easily so it's difficult to foresee where
> we should put resources, which quantity and at which pace.
>   How should we proceed? I guess we should export the users/associations
> data from the production DB using s"acctmgr dump" and load the result file
> in the blank DB with "sacctmgr load" before using "sacctmgr archive load"
> to insert the different archive files, one by one according to the date
> of the content.

I'm taking over here for Nate since he got pulled into some other work. I apologize for the delayed response - I encountered a few issues while researching.

TL;DR - I discuss a few options and caveats below, but I think the easiest thing for you to do is to mysqldump the entire production database, then sacctmgr archive load all the archive files you currently have. Then you can continue to sacctmgr archive load new archive files as they whenever the production database has new data that is archived and purged.


Long explanation:

You can export the associations with sacctmgr dump and load those into the new database with sacctmgr load. I ran into one prerequisite and one problem for sacctmgr load to work:

(1) The user calling it has to be an admin, SlurmUser, or root in the database. Just do this as root.
(2) However, sacctmgr load still doesn't work if you added any QOS to the table besides the "normal" QOS (which is the default QOS in Slurm). So you will have to dump the QOS table from your production database and load it into the new database.


Example:

* Dump qos_table from my old database (slurm_2002) to a file
* Start slurmdbd, then load this into the new database as user root
* Then call sacctmgr load as user root
 
$ mysqldump -u marshall slurm_2002  qos_table > slurm_2002_qos_table.sql
# mysql -u marshall slurm_master_bug8910 < slurm_2002_qos_table.sql
# sacctmgr load discovery1.cfg
For cluster discovery1
< Then a bunch of output as it processes the file >
sacctmgr: Done adding cluster in usec=387935
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y


Another issue - if you have more TRES than the defaults, you'll also need to backup/restore the TRES manually along with the QOS. (such as gres/gpu - check with sacctmgr show tres)

List of default TRES:
$ sacctmgr show tres
    Type            Name     ID
-------- --------------- ------
     cpu                      1
     mem                      2
  energy                      3
    node                      4
 billing                      5
      fs            disk      6
    vmem                      7
   pages                      8

Another option is to manually backup and restore the tres_table in the database just like the qos_table.


Then you can do sacctmgr archive load <archive file> for all your archive files. You will need to wait for slurmdbd to rollup database usage before sreport is useful. Rollup happens every hour, or when slurmdbd starts up if it hasn't rolled up for the last hour yet. If it isn't happening and you get an error like "Cluster discovery1 not registered, not doing rollup", you may need to start a slurmctld instance that connects to slurmdbd on that node to make the cluster register.


You can continue to purge/archive for 3 months in your production slurmdbd/database. When each new archive file is created, you should be able to simply load the archive file into the separate database. If you want to update this more often than once per month, then you could set the "Purge*After" values in slurmdbd.conf to "90days" instead of "3months" to make purging/archiving happen once per day instead of once per month.


Note:
Sometimes jobs aren't archived in a period where they normally would have been archived. If this happens, whenever the job gets archived then you'll get a new archive file with that same time period, but with a ".1", ".2", etc. to distinguish them. You can call sacctmgr archive load that file normally. This is documented in "ArchiveDir" in the slurmdbd.conf man page.
(https://slurm.schedmd.com/slurmdbd.conf.html#OPT_ArchiveDir)


>   Will it be find if we insert jobs belonging to users that have been
> deleted after the archive file creation and that won't be, as a consequence,
> in the blank database populated with the result of the "sacctmgr dump"
> command?

Just to clarify - you're asking if associations have been deleted from the database, what will happen if you archive load jobs from those associations? It should be fine. The jobs will still load, you can still query jobs from those users. However, because those associations don't exist in the new database then usage for them won't be rolled up, so certain usage reports from sreport (such as AccountUtilizationByUser) won't be the same as on your production database, because they don't have those associations.

As an example:
I ran some jobs as user "kelsey" on May 9th, but deleted this user before calling sacctmgr dump in the old database.

Old slurmdbd - I delete user "kelsey" and run sacctmgr dump. As you can see, kelsey isn't in the file:

$ sacctmgr del user kelsey
 Deleting users...
  kelsey
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
$ sacctmgr dump discovery1
 No filename given, using ./discovery1.cfg.
sacctmgr: Cluster - 'discovery1':Fairshare=1:QOS='normal'
sacctmgr: Parent - 'root'
sacctmgr: User - 'root':DefaultAccount='root':AdminLevel='Administrator':Fairshare=1
sacctmgr: Account - 'acct1':Description='acct1':Organization='acct1':Fairshare=1
sacctmgr: Parent - 'acct1'
sacctmgr: User - 'marshall':DefaultAccount='acct1':WCKeys='test21_41wckey1':Fairshare=1:MaxJobsAccrue=4:QOS='+qos2,+qos3'


New database: "kelsey" isn't in the database but I can still query jobs.

# sacctmgr show users
      User   Def Acct  Def WCKey     Admin
---------- ---------- ---------- ---------
  marshall      acct1 test21_41+      None
      root       root            Administ+

# sacct -X -a -S1/1/20 --user=kelsey --format=jobid,jobname,start,end,state
       JobID    JobName               Start                 End      State
------------ ---------- ------------------- ------------------- ----------
1622_1               j1 2020-05-09T12:00:56 2020-05-09T12:00:57     FAILED
1622_2               j2 2020-05-09T12:00:57 2020-05-09T12:00:57     FAILED
1622_3               j3 2020-05-09T12:00:57 2020-05-09T12:00:57     FAILED
1622_4               j4 2020-05-09T12:00:57 2020-05-09T12:00:57     FAILED


If you want it to be the same as the production database, then instead of using sacctmgr dump you'll need to manually dump the following tables from the old database and restore them into the new database:

acct_coord_table, acct_table, qos_table, tres_table, user_table


Or, just dump the entire production database and restore it on the other node. When rows are deleted from tables in the Slurm database, instead of actually removing that row from the table Slurm sets the "deleted" column to 1 to indicate that they've been deleted. So Slurm still has records of the old users/accounts.


>   We also wonder how we could create archives more frequently without
> purging the production DB. It seems to be possible given the sentence
> """
> Archive dump
> Dump accounting data to file. Depending on options and slurmdbd configuration
> data may remain in database or be purged.
> """
>
> Should we use a script for that, using the SLURM_ARCHIVE_LAST_JOB to give
> the date of the most recent job to dump in the archive? Could you provide
> a short example/template for that?

You can't archive without purging. To create archives more frequently I recommend using "days" or "hours" instead of "months" in the Purge*After options, though you still won't get any data in the archive that isn't purged. For example, if you purge/archive 3 months of data, you won't see any jobs in the archive that are less than 3 months old.

I don't know if you could ever archive without purging. Either the sacctmgr man page that you quoted is wrong, or it's a bug.

In slurmdbd.conf I don't have any of the Purge* settings. Without that, I get these errors on the command line and in slurmdbd.log.

$ sacctmgr archive dump directory=/home/marshall/archive/20.02 jobs steps
This may result in loss of accounting database records (if Purge* options enabled).
Are you sure you want to continue? (You have 30 seconds to decide)
(N/y): y
sacctmgr: error: slurmdbd: Error with request.
 Problem dumping archive: Unspecified error


[2020-05-14T11:13:05.358] debug:  REQUEST_PERSIST_INIT: CLUSTER:discovery1 VERSION:8960 UID:1000 IP:127.0.0.1 CONN:12
[2020-05-14T11:13:06.464] error: No known unit given for purge, we are guessing mistake and returning error
[2020-05-14T11:13:06.464] error: Parsing purge discovery1_step_table
[2020-05-14T11:13:06.464] error: Processing last message from connection 12(127.0.0.1) uid(1000)

With the Purge* settings, I can purge and archive normally.

I need to investigate this more.


Does this answer your questions enough for you to do what you want?
Comment 17 Marshall Garey 2020-06-02 09:57:37 MDT
We've updated the documentation to clarify that you can't archive without purging in commit c46e780e07ef8a29. We've also created an internal enhancement ticket to add QOS and TRES to sacctmgr dump/load.

I'm closing this as resolved/fixed per the documentation fix. Let us know if you have more questions.
Comment 18 Sophie Créno 2020-06-02 10:42:04 MDT
Hello,

  Sorry for the delay in my answer. Thanks a lot for the time you took
to investigate and write that comprehensive answer regarding the procedure
to setup a separate database.

  So, if I understand correctly, we could:

* either keep our slurmdbd setting regarding the archive/purge and, in 
this case, keep the last 90 days in the production DB whereas all the rest
would be in the "historic" DB. So we would have to merge 2 sreport outputs
if we wanted to have accounting information over 6 months for example.

* or decrease the values of Purge* settings so that there would be just
a few days (let's say a week or maybe even less?) of records in the production
DB and all the rest in the "historic" one. Then we would execute sreport 
essentially on the latter one.

 Of course, as you said, in both cases we would create archive files everyday
and load them straight afterwards in the "historic" DB. Do you think that 
keeping only 24 hours in the production DB would be sensible?

Thanks a lot,
Comment 19 Marshall Garey 2020-06-02 11:12:54 MDT
Re-opening the ticket per your response.

Yes, those two options would both work.

> Do you think that 
> keeping only 24 hours in the production DB would be sensible?

I think it depends on how your users and admins use the production database and who can access the historic database. If users don't need more than 1 day of history, then that's fine.

From Slurm's point of view, it doesn't matter. The scheduler/resource manager part of Slurm will continue to work the same, even without a database. (slurmdbd is an optional, though strongly recommended, part of Slurm.)
Comment 20 Sophie Créno 2020-06-02 12:14:58 MDT
>> Do you think that keeping only 24 hours in the production DB
>> would be sensible?
>
> I think it depends on how your users and admins use the production
> database and who can access the historic database. If users don't
> need more than 1 day of history, then that's fine.

  Some teams want to keep track of how much each of their members/projects 
consume so we would have to let them access the historic db. They also
sometimes need figures to apply for a grant for example and even a week
would definitely not be enough in that case.


> From Slurm's point of view, it doesn't matter. The scheduler/resource
> manager part of Slurm will continue to work the same, even without
> a database. (slurmdbd is an optional, though strongly recommended,
> part of Slurm.)

  Ah yes that's right but it would be so weird to me not be able to access
accounting information that I forget that slurmdbd is indeed optional.

Thanks again for your help.
Comment 21 Marshall Garey 2020-06-02 12:31:38 MDT
You're welcome. Re-closing as resolved/fixed