| Summary: | Unable to run sacct on new system, permission denied | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jeff White <jeff.white> |
| Component: | Accounting | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 15.08.7 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Washington State University | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Jeff White
2016-06-27 11:42:45 MDT
This looks an awful lot like a munge key mismatch. Can you check that munge has been restarted on the new host after installing the cluster key? I think you'll see the same message if the clocks are out of sync on the systems by more than a minute. There should be some log messages in slurmdbd / slurmctld that would narrow down the issue - can you test and provide those from the same time as your failed commands? You had it with the clock sync comment. Looks like we lost access to our NTP server and this new host was the only one off enough to cause a problem. From what I can see Slurm and munge don't log this as an error. |