Ticket 6475

Summary: pam_slurm_adopt fails for interactive jobs
Product: Slurm Reporter: hpc-admin
Component: OtherAssignee: Nate Rini <nate>
Status: RESOLVED CANNOTREPRODUCE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: nate
Version: 17.11.8   
Hardware: Linux   
OS: Linux   
Site: Ghent Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description hpc-admin 2019-02-11 05:45:23 MST
What we see:

> [22:35:43] vsc40023@gligar04:~ $ squeue
> CLUSTER: skitty
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
>             179063    skitty INTERACT vsc40023  R   12:23:13      1
> node3168.skitty.os

> [22:35:46] vsc40023@gligar04:~ $ ssh node3168
> Access denied by pam_slurm_adopt: you have no active jobs on this node
> Authentication failed.


The adoption does work as expected for normal jobs. Is there any setting we ought to look at?

Kind regards,
-- Andy
Comment 1 Nate Rini 2019-02-11 10:02:01 MST
Andy

Can you please add "log_level=debug5" to your sshd pam configuration and then attach the pam logs of sshing into your test job that gets rejected. You can remove the higher loglevel once you have gotten the logs. Please also make sure the logs you attach don't have any privileged user information, if so please replace it with something like "#####".

Thanks,
--Nate
Comment 2 Nate Rini 2019-02-11 10:23:51 MST
Andy

Can you also please attach your slurm.conf?

Thanks
--Nate
Comment 3 hpc-admin 2019-02-12 02:14:43 MST
Hi,

I just retried and seemingly things work properly now. I'll keep an eye on this and close this ticket for now, as I am pretty clueless about what may have changed.

-- Andy