Summary: | RawUsage numbers suddenly impossibly high after upgrade | ||
---|---|---|---|
Product: | Slurm | Reporter: | Kaylea Nelson <kaylea.nelson> |
Component: | Accounting | Assignee: | Albert Gil <albert.gil> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 3 - Medium Impact | ||
Priority: | --- | CC: | adam.munro |
Version: | 20.02.6 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Yale | Slinky Site: | --- |
Alineos Sites: | --- | Atos/Eviden Sites: | --- |
Confidential Site: | --- | Coreweave sites: | --- |
Cray Sites: | --- | DS9 clusters: | --- |
Google sites: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | RHEL | Machine Name: | Grace |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- | ||
Attachments: |
sshare output
current conf |
Description
Kaylea Nelson
2021-02-10 12:42:32 MST
Created attachment 17869 [details]
current conf
We also found that prior to a slurmctld and slurmdbd restart on 2/8, there are many error similar to error: We have more time than is possible (634982400+582615985152+0)(583250967552) > 583027891200 for cluster grace(161952192) from 2021-02-02T22:00:00 - 2021-02-02T23:00:00 tres 2 error: We have more time than is possible (115200+62791336+0)(62906536) > 62802000 for cluster grace(17445) from 2021-02-03T13:00:00 - 2021-02-03T14:00:00 tres 5 The cluster was undergoing maintenance from 2/2-2/4, so there were no users on the system but Yale staff may have been running test jobs for some of that time. Hi Kaylea, Yes, I'm already tracking your case on bug 10824, although you have a different version than Harvard and Princeton, the root error seems to be the same. Actually it could also be some clue that you also have those "more time than is possible" errors because Harvard also had them on bug 10753. If this is ok for you I'm closing this bug as duplicate of bug 10824 con concentrate our investigation there. If we finally see that the problem is not share between versions, I'll reopen this one. Regards, Albert Marking as duplicate of bug 10824. *** This ticket has been marked as a duplicate of ticket 10824 *** |