| Summary: | sacct does not work after upgrade from 17.2.9 to 17.11.7 | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Nikolas Luke <niko.luke> |
| Component: | slurmdbd | Assignee: | Felip Moll <felip.moll> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | felip.moll |
| Version: | 17.11.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Hessen | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: |
slurmdbd.log
slurmConfigure.out from rpmbuild slurmConfigure.err from rpmbuild |
||
Hi Nikolas, Despite losing slurm state directory can lead to several issues, this doesn't have anything to do with sacct. The Job ID can be reused, it happens for example when a job is requeued. The internal index is job_inx and used internally. I think your problem is more in the access permissions: sacct: error: slurmdbd: Operation not permitted Can you send slurmdbd logs? Have you modified the database and user grants? Is your slurmdbd.conf up-to-date with any possible user/pass/db change? The slurmdbd is now down after a systemctl restart slurmdbd. slurmdbd.log: [2018-06-04T14:20:37.621] fatal: Unable to initialize accounting_storage/mysql accounting storage plugin [2018-06-07T13:26:17.768] fatal: Unable to initialize accounting_storage/mysql accounting storage plugin (In reply to Nikolas Luke from comment #2) > The slurmdbd is now down after a systemctl restart slurmdbd. > > slurmdbd.log: > [2018-06-04T14:20:37.621] fatal: Unable to initialize > accounting_storage/mysql accounting storage plugin > [2018-06-07T13:26:17.768] fatal: Unable to initialize > accounting_storage/mysql accounting storage plugin Is it possible you compiled without mysql-devel package? Can you show me your config.log? And also send me your full slurmdbd log please. Created attachment 7016 [details]
slurmdbd.log
I build the packages with: rpmbuild -ta slurm-17.11.7.tar.bz2 >slurmConfigure.out 2>slurmConfigure.err This results in the files slurm-17.11.7-1.el7.centos.x86_64.rpm slurm-contribs-17.11.7-1.el7.centos.x86_64.rpm slurm-devel-17.11.7-1.el7.centos.x86_64.rpm slurm-example-configs-17.11.7-1.el7.centos.x86_64.rpm slurm-libpmi-17.11.7-1.el7.centos.x86_64.rpm slurm-openlava-17.11.7-1.el7.centos.x86_64.rpm slurm-pam_slurm-17.11.7-1.el7.centos.x86_64.rpm slurm-perlapi-17.11.7-1.el7.centos.x86_64.rpm slurm-slurmctld-17.11.7-1.el7.centos.x86_64.rpm slurm-slurmd-17.11.7-1.el7.centos.x86_64.rpm slurm-slurmdbd-17.11.7-1.el7.centos.x86_64.rpm slurm-torque-17.11.7-1.el7.centos.x86_64.rpm I wondered there was no slurm-sql, no slurm-munge and no slurm-plugins packet this time. Which config.log do you mean? slurmctld.log? (In reply to Nikolas Luke from comment #5) > I wondered there was no slurm-sql, no slurm-munge and no slurm-plugins > packet this time. > That's correct. These have been removed. > I build the packages with: > > rpmbuild -ta slurm-17.11.7.tar.bz2 >slurmConfigure.out 2>slurmConfigure.err > ... > Which config.log do you mean? slurmctld.log? I mean this slurmConfigure.out and .err Try also to run slurmdbd as SlurmUser (defined in slurmdbd.conf) manually like this: slurmdbd -Dvvv And send me the output together with the slurmConfigure.*. Created attachment 7018 [details]
slurmConfigure.out from rpmbuild
Created attachment 7019 [details]
slurmConfigure.err from rpmbuild
Output of slurmdbd -Dvvv as user "root": slurmdbd: debug: Log file re-opened slurmdbd: debug: Munge authentication plugin loaded slurmdbd: debug2: mysql_connect() called for db slurm_acct_db slurmdbd: error: mysql_query failed: 1 Can't create/write to file '/tmp/#sql_367_0.MAI' (Errcode: 2) show columns from convert_version_table slurmdbd: Accounting storage MYSQL plugin failed slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin Output of slurmdbd -Dvvv as user "slurm" (SlurmUser=slurm): slurmdbd: error: s_p_parse_file: unable to read "/etc/slurm/slurmdbd.conf": Permission denied slurmdbd: fatal: Could not open/read/parse slurmdbd.conf file /etc/slurm/slurmdbd.conf Well(In reply to Nikolas Luke from comment #9) > Output of slurmdbd -Dvvv as user "root": > > slurmdbd: debug: Log file re-opened > slurmdbd: debug: Munge authentication plugin loaded > slurmdbd: debug2: mysql_connect() called for db slurm_acct_db > slurmdbd: error: mysql_query failed: 1 Can't create/write to file > '/tmp/#sql_367_0.MAI' (Errcode: 2) > show columns from convert_version_table > slurmdbd: Accounting storage MYSQL plugin failed > slurmdbd: error: Couldn't load specified plugin name for > accounting_storage/mysql: Plugin init() callback failed > slurmdbd: error: cannot create accounting_storage context for > accounting_storage/mysql > slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting > storage plugin > > > Output of slurmdbd -Dvvv as user "slurm" (SlurmUser=slurm): > > slurmdbd: error: s_p_parse_file: unable to read "/etc/slurm/slurmdbd.conf": > Permission denied > slurmdbd: fatal: Could not open/read/parse slurmdbd.conf file > /etc/slurm/slurmdbd.conf Well, you can see the problems here. A) It seems slurm user cannot read /etc/slurm/slurmdbd.conf B) Moreover it seems that /tmp/ is not writable by mysql. This error comes from mysql server: > slurmdbd: error: mysql_query failed: 1 Can't create/write to file > '/tmp/#sql_367_0.MAI' (Errcode: 2) For example, see: https://stackoverflow.com/questions/11997012/mysql-cant-create-write-to-file-tmp-sql-3c6-0-myi-errcode-2-what-does?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa First fix A), if it still doesn't work, fix B). Check with slurmdbd -Dvvv as slurm user. The slurmdbd.conf had lost it's permissions. Now the user slurm can read it and is using the right pathes for the logfile instead of default pathes. Good idea to use the "slurmdbd -Dvvv" command, when there is nothing in the log. Same with the mysql tempdir. It had lost it's permissions, too. slurmdbd is running and sacct is working now. Many thanks! You're welcome :) Closing the issue now. |
After upgrade from Slurm 17.2.9 to 17.11.7 the sacct command does not work. I have lost the spool directory data, so the job numbers are reset to 1, 2, 3. There could be the problem, because data of this job numbers are already in the database. I know the last jobnumber before the update. Here the error of all users i tried: sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- sacct: error: slurmdbd: Operation not permitted