Ticket 14311 - sacctmgr: procedure to dump/restore QoS settings.
Summary: sacctmgr: procedure to dump/restore QoS settings.
Status: RESOLVED DUPLICATE of ticket 7450
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 21.08.8
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-06-14 06:30 MDT by Brad Viviano
Modified: 2022-07-14 19:58 MDT (History)
1 user (show)

See Also:
Site: EPA
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Brad Viviano 2022-06-14 06:30:32 MDT
Hello,
    We're currently setting up a new cluster based on RHEL8 here @EPA.  I wanted to dump my base configuration (QoS, Users, Accounts, etc) from my existing RHEL7 cluster (Both are using Slurm 21.08.8) and restore it to the RHEL8 cluster, but the "sacctmgr dump" DOES NOT support dumping QoS.

    Per this bug report, it's apparently been an issue for a while and hasn't been resolved:

https://bugs.schedmd.com/show_bug.cgi?id=9111

   This seems a pretty big hole in the entire dump/restore process.

   We have some pretty complicated QoS setups in our current RHEL7 cluster that we've built over the last 5+ years and rebuilding those with sacctmgr commands to replicate them to the new cluster isn't overly appealing.

   Therefore, I need a procedure from SchedMD to initialize a new cluster with a new database and copy/restore the QoS settings so I can use "sacctmgr load" to then restore the accounts, users, etc.

    Thanks.
Comment 2 Nate Rini 2022-06-14 08:37:30 MDT
Brad

Currently, QOS dumping/loading does not exist in sacctmgr load/dump commands. It does however exist in slurmrestd:
> https://slurm.schedmd.com/rest_api.html#slurmdbdGetQos

Please tell me if you want specific examples of how to dump and load the QOS configs.

--Nate
Comment 3 Brad Viviano 2022-06-14 08:40:18 MDT
> Please tell me if you want specific examples of how to dump and load the QOS configs.

Yes.  I'd like a specific example or a reference to a FAQ of how I would dump the QoS, similar to what I would do with "sacctmgr dump ...." and "sacctmgr load ...."

Thanks.
Comment 4 Nate Rini 2022-06-14 08:49:58 MDT
(In reply to Brad Viviano from comment #3)
> > Please tell me if you want specific examples of how to dump and load the QOS configs.
> Yes.  I'd like a specific example or a reference to a FAQ of how I would
> dump the QoS, similar to what I would do with "sacctmgr dump ...." and
> "sacctmgr load ...."

Does your site have JWT auth setup?
Comment 5 Brad Viviano 2022-06-14 08:51:29 MDT
We're not running slurmrestd currently.

Is there a solution that would work using sacctmgr?
Comment 6 Nate Rini 2022-06-14 09:32:57 MDT
(In reply to Brad Viviano from comment #5)
> We're not running slurmrestd currently.

As long as it is compiled and installed, it's not required to be running as a systemd service.

Start the daemon. I'm just having it listen on a UNIX socket so we can use curl:
> $ slurmrestd unix:$HOME/.slurmrestd.sock

To dump the current config of QOS:
> curl --unix-socket $HOME/.slurmrestd.sock "http://localhost/slurmdb/v0.0.38/qos" > qos.json

To load:
> curl --unix-socket $HOME/.slurmrestd.sock -X POST -H "Content-Type: application/json" "http://localhost/slurmdb/v0.0.38/qos" --data-binary @qos.json

Note that if you're attempting to load this, it may reject the request if the "id" field is set but there is already an existing QOS with the same id.

> Is there a solution that would work using sacctmgr?

We can see about turning this ticket into an RFE to have that functionality added if you prefer.
Comment 7 Nate Rini 2022-06-14 10:53:10 MDT
Please note the URLS should be v0.0.37 instead of v0.0.38.
Comment 8 Brad Viviano 2022-06-14 11:33:27 MDT
> We can see about turning this ticket into an RFE to have that functionality added if you prefer.

It seems to me there should be a standard method to move ALL settings, MINUS job history between slurmdbd instances.

There are a few tickets in bugs.schedmd.com going back to 2017 were others have asked for the capability, to have "sacctmgr dump/load" include QoS.  Really not sure why it hasn't been added in the last 5 years.

That said, I don't need the capability built in/automatic.  But there should be a clear procedure in the documentation for situations where a user wants to move all settings, EXCEPT job history from Cluster A -> Cluster B.

There certainly are times when I can see this being useful.  Setting up a testing environment being the most common.  Maybe I want to spin up a handful of VMs and test functionality in 22.05.X before upgrading my live production cluster from 21.08.  I need all my database settings (Users, Accounts, Associations, QoS, etc) to be able to test correctly, but I don't want to drag the 5 million+ job entries along with me via a full mysqldump from my production environment.

So, however that gets resolved, RFE as a feature in the code or just better documentation is fine with me :).

Thanks.
Comment 9 Nate Rini 2022-06-14 13:04:42 MDT
(In reply to Brad Viviano from comment #8)
> > We can see about turning this ticket into an RFE to have that functionality added if you prefer.
> 
> It seems to me there should be a standard method to move ALL settings, MINUS
> job history between slurmdbd instances.

Please note that it is possible to dump all of the configurations in slurmdbd (done via sacctmgr or slurmrestd) via:
> curl --unix-socket $HOME/.slurmrestd.sock "http://localhost/slurmdb/v0.0.37/config" > config.json
Then the config can be applied via
> curl --unix-socket $HOME/.slurmrestd.sock -X POST -H "Content-Type: application/json" "http://localhost/slurmdb/v0.0.37/config" --data-binary @config.json

> There are a few tickets in bugs.schedmd.com going back to 2017 were others
> have asked for the capability, to have "sacctmgr dump/load" include QoS. 
> Really not sure why it hasn't been added in the last 5 years.

We will discuss this internally.
Comment 16 Nate Rini 2022-06-29 14:37:30 MDT
Is adding the QOS dumping functionality to `sacctmgr dump` something your site is interested in sponsoring?
Comment 18 Brad Viviano 2022-06-30 03:59:45 MDT
I'm not sure what "sponsoring" means.
Comment 21 Nate Rini 2022-07-14 13:02:12 MDT
When and if your site would like to sponsor this, please do reply, and we can start the normal RFE process again.