| Summary: | SlurmDBD Upgrade | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | John Villa <jv2575> |
| Component: | slurmdbd | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 1 - System not usable | ||
| Priority: | --- | ||
| Version: | 17.11.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Columbia University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
We had one customer with a database of 3 million job records that took 45 minutes. Here at SchedMD, Danny's identical database took 15 minutes. So how long it takes depends largely on the hardware. You should be able to let the update run and it shouldn't affect running jobs or the job queue at all. Records won't be written to the database during the update, but they'll be queued up and written to the database once the update is complete. You can't skip updating the database. I'm unable to CC hpc-admin@columbia.edu - bugzilla doesn't recognize that email on the list, and I can't add it. Today is a holiday, but tomorrow I can ask internally if that email can be added. We didn't respond right away because today is a holiday. I just happen to be working today, but everyone else is out and we typically aren't around on holidays. Can you let us know how long it took to update when the process is complete? Did you shutdown slurmctld during the update, or is it still running? As an FYI, with the slurmdbd taking a long time to convert or rollup, systemd sometimes likes to kill it. So run slurmdbd in the foreground (using the slurmdbd -D flag) so it runs separately from systemd. Run it on at least the info log level to see this message:
info("Conversion done: success!");
Hello, Thank you for the update. The update finished. It took a couple of hours but everything seems fine now in regards to slurmdbd. Please feel free to close this bug. Sincerely, John Villa On Feb 19, 2018, at 4:34 PM, bugs@schedmd.com wrote: *Comment # 4 <https://bugs.schedmd.com/show_bug.cgi?id=4807#c4> on bug 4807 <https://bugs.schedmd.com/show_bug.cgi?id=4807> from Marshall Garey <marshall@schedmd.com> * As an FYI, with the slurmdbd taking a long time to convert or rollup, systemd sometimes likes to kill it. So run slurmdbd in the foreground (using the slurmdbd -D flag) so it runs separately from systemd. Run it on at least the info log level to see this message: info("Conversion done: success!"); ------------------------------ You are receiving this mail because: - You reported the bug. We're glad everything is working fine. Closing as resolved/infogiven. For adding people to the CC list: see bug 4748 comment 2 - Tim said, "He'd need to setup an account within our Bugzilla instance; once that's done he can add himself as a CC, or you can do it as well." Hello, We noticed that absence of the "sview" application under "bin" or "sbin" when we compiled 17.11.2 of slurm. Is there any reason this would be missing? We have gtk2 and gtk3 installed. Please advise. Thanks, John Villa On Tue, Feb 20, 2018 at 7:00 PM, <bugs@schedmd.com> wrote: > *Comment # 7 <https://bugs.schedmd.com/show_bug.cgi?id=4807#c7> on bug > 4807 <https://bugs.schedmd.com/show_bug.cgi?id=4807> from Marshall Garey > <marshall@schedmd.com> * > > For adding people to the CC list: see bug 4748 comment 2 <https://bugs.schedmd.com/show_bug.cgi?id=4748#c2> - Tim said, > > "He'd need to setup an account within our Bugzilla instance; once that's done > he can add himself as a CC, or you can do it as well." > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Hey John, Will you open a new bug your question? That way we can keep things separate. Also, FYI for the future, if you ever need to reopen a bug be sure to change the status back to "Unconfirmed" from the website. This will allow the bug to show up on our lists and help prevent responses from being lost. Thanks, Brian Brian, Can you create a new bug on my behalf? Thank you, John Villa On Feb 21, 2018, at 1:28 PM, bugs@schedmd.com wrote: *Comment # 9 <https://bugs.schedmd.com/show_bug.cgi?id=4807#c9> on bug 4807 <https://bugs.schedmd.com/show_bug.cgi?id=4807> from Brian Christiansen <brian@schedmd.com> * Hey John, Will you open a new bug your question? That way we can keep things separate. Also, FYI for the future, if you ever need to reopen a bug be sure to change the status back to "Unconfirmed" from the website. This will allow the bug to show up on our lists and help prevent responses from being lost. Thanks, Brian ------------------------------ You are receiving this mail because: - You reported the bug. I'd prefer that you do it. That way I'm not tagged as the reporter. |
Hello, We are running into an issue after upgrading slurmdbd. It appears that a particular table within our accounting database has changed and has been updated as per the error logs: slurmdbd: debug: Log file re-opened slurmdbd: debug: Munge authentication plugin loaded slurmdbd: debug2: mysql_connect() called for db slurm_acct_db slurmdbd: pre-converting job table for habanero slurmdbd: debug: Table "habanero_job_table" has changed. Updating... This appears to be taking a rather long time. Here is one such record: *************************** 6. row *************************** Id: 53 User: slurm Host: roll.cm.cluster:33872 db: slurm_acct_db Command: Query Time: 2274 State: copy to tmp table Info: alter table "habanero_job_table" modify `job_db_inx` bigint unsigned not null auto_increment, modify Progress: 34.525 We have over 5 million records. Would it be possible for you to provide us with an estimated amount of time as to when this might finish? Is there a way for us to skip this step? Any advice would use helpful for we scheduled this downtime and did not foresee this. Please cc hpc-admin@columbia.edu on all correspondence here. Thanks, John Villa