Ticket 16830

Summary: sacct sometimes reports the wrong uid
Product: Slurm Reporter: Joseph Guzman <joseph.f.guzman>
Component: AccountingAssignee: Jacob Jenson <jacob>
Status: OPEN --- QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 23.02.2   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: RHEL Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Joseph Guzman 2023-05-25 18:15:37 MDT
We noticed that since updating to Slurm v23.02.2 from v22.05.7 that sacct was reporting jobs from two users when specifying a single user with the -u flag. When jobs from the unexpected 2nd user were shown in the sacct output, we noticed that sacct would also report the uid to be the same for those jobs, which differs from that reported by standard system utilities on all the hosts.

For example like this:

$ sacct -u user1 -X -P -o jobid,timelimit,user,uid -S 2023-05-23 --noheader
1|01:00:00|user2|16074
4|01:00:00|user2|16074
7|01:00:00|user2|16074
9|01:00:00|user2|16074
79|01:00:00|user2|16074
86|01:00:00|user2|16074
89|01:00:00|user2|16074
90|01:00:00|user2|16074
180|01:00:00|user2|16074
181|01:00:00|user2|16074
185|01:00:00|user2|16074
186|01:00:00|user2|16074
194|01:00:00|user2|16074
47879|04:00:00|user1|16074
47912|02:00:00|user1|16074
47917|02:00:00|user1|16074
47918|04:00:00|user1|16074
47938|02:00:00|user1|16074
47941|02:00:00|user1|16074
47942|02:00:00|user1|16074
$

After looping through sacct output, we found that sacct reported the wrong uid for these 248 instances, and we can confirm that the username was correct for those jobids judging by the SubmitLine values that we checked. When there was a user-to-uid mismatch from sacct, it was the same wrong uid each time, which was valid but for another user. The mismatches occur for a small minority of jobs submitted by several other users.

Are there any known problems from importing the slurm database from a v22.05.7 slurm instance to a v23.02.2 one? Or what could be the issue here? We're using slurmdbd for accounting.

Thanks,

Joseph