Ticket 1803 - uid=4294967294 when scontrol update
Summary: uid=4294967294 when scontrol update
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 14.11.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2015-07-13 15:10 MDT by Akmal Madzlan
Modified: 2015-07-20 21:14 MDT (History)
2 users (show)

See Also:
Site: DownUnder GeoSolutions
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Akmal Madzlan 2015-07-13 15:10:17 MDT
[2015-07-13T13:32:17.017] _slurm_rpc_update_job complete JobId=4275962 uid=1260 usec=754
[2015-07-13T13:32:17.062] _part_access_check: uid 4294967294 access to partition teamoxford denied, bad group
[2015-07-13T13:32:17.063] _part_access_check: uid 4294967294 access to partition idle denied, bad group
[2015-07-13T13:32:17.063] _part_access_check: uid 4294967294 access to partition desktopBigMem denied, bad group
[2015-07-13T13:32:17.063] update_job: setting partition to lud54 for job_id 4275963


Any idea how slurm got those uid?
One of our user try to update his job using his own script/bash function.

qu() {
### Queue Update

if [[ -n "$1" ]]; then
grep -v JOBID | awk -v pp=$1 '{ print( "scontrol update job="substr($9,1,7)" priority="pp" partition=teamoxford,idle,desktopBigMem,lud54 " ) ; system ( "scontrol update job="substr($9,1,7)" priority="pp" partition=teamoxford,idle,desktopBigMem,lud54 " ) }'
else
grep -v JOBID | awk '{ print( "scontrol update job="substr($9,1,7)" priority=500 partition=teamoxford,idle,desktopBigMem,lud54 " ) ; system( "scontrol update job="substr($9,1,7)" priority=500 partition=teamoxford,idle,desktopBigMem,lud54 " ) }'
fi

else
echo "qu Error"
fi

}

And some of the scontrol spit out those access denied error. I'm unable to reproduce his issue. Maybe the issue is gone after I restarted slurmctld.
Comment 1 Moe Jette 2015-07-14 03:08:53 MDT
The UID value of 4294967294 (or -1) reported by Slurm comes from the request's credential, which is generated by Munge. Munge initializes the UID and GID in its credentials with a value of -1. That gets changed once the credential is decoded. Munge should log an error otherwise.

If you look at the Munge logs on both the client and server you should find some indication of why there was a failure. There may also be a munge error in slurmctld's log file before the lines you included in the first message. My best guess is that user does not have an account on the node where slurmctld runs. They don't need login access, but the account should exist.

The default Munge log file location is "/var/log/munge/munged.log"
Comment 2 Moe Jette 2015-07-20 10:26:30 MDT
Were you able to determine the sourced of the bad Munge credential?
Comment 3 Akmal Madzlan 2015-07-20 14:13:32 MDT
I'm unable to trace the cause and It seems like this has not happened anymore

Thanks Moe,
Akmal
Comment 4 David Bigagli 2015-07-20 20:46:04 MDT
What was the real id of that user?

David
Comment 5 Akmal Madzlan 2015-07-20 21:14:11 MDT
His real id is 1260

Akmal