Ticket 5380

Summary: pam_slurm_adopt, two jobs on a single node, cgroup path
Product: Slurm Reporter: Ben Matthews <matthews>
Component: OtherAssignee: Alejandro Sanchez <alex>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex, marshall
Version: 17.11.5   
Hardware: Linux   
OS: Linux   
Site: UCAR Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Ben Matthews 2018-07-03 14:38:26 MDT
By default, pam_slurm_adopt is supposed to look at the memory cgroup's timestamps to determine which job to adopt an out-of-band process into. At some point this stopped working on our system, and it seems to be because it is looking at the wrong place in the cgroup hierarchy. If I have two jobs on a node, and I then try to ssh into that node, I see this in the log:

2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: From 128.117.8.161 port 59672 as matthews: unable to determine source job
2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: Couldn't stat path '/sys/fs/cgroup/memory/slurm/uid_24712/job_837188'
2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: Couldn't stat path '/sys/fs/cgroup/memory/slurm/uid_24712/job_837185'

But, the correct path would seem to be:

/sys/fs/cgroup/memory/slurm_caldera14/uid_24712/job_837185/

It looks like there is some code for this in the latest pam_slurm_adopt, so I'm not completely sure why it's not working. Is there some extra config we need in 17.11.5?
Comment 1 Alejandro Sanchez 2018-07-04 03:45:15 MDT
Hi Ben,

looking at the code, I see this:

        if (opts.node_name)
                cgroup_suffix = xstrdup_printf("_%s", opts.node_name);

        if (snprintf(uidcg, PATH_MAX, "%s/memory/slurm%s/uid_%u",
                     slurm_cgroup_conf->cgroup_mountpoint, cgroup_suffix, uid)
            >= PATH_MAX)

in the _indeterminate_multiple() function. Meaning if nodename option is specified in the PAM module, then .../slurm%s/... will be substituted with such name to generate the final path. After that there's a call _cgroup_creation_time() to end constructing the path string with the %s/job_%u information. Could you try to add the "nodename" option to the module? Something like this:

account    sufficient    pam_slurm_adopt.so nodename=caldera14

and see if it works?

I'm not sure why the nodename option isn't mentioned in the guide

https://slurm.schedmd.com/pam_slurm_adopt.html

Will discuss that internally and come back to you. Thanks.
Comment 4 Ben Matthews 2018-07-05 11:10:03 MDT
Yup, that seems to work. I saw that code as well, but assumed that it should be able to figure out the nodename (unless people were doing something odd with NodeAddr) and that I was missing something. 

It be nice not to have to customize the pam config for every node
Comment 7 Alejandro Sanchez 2018-07-06 05:19:23 MDT
Did you by any chance happen to compile with --enable-multiple-slurmd option?

After a few local tests and inspecting the code, I _suspect_ slurmd daemon generates the cgroup path in the form of .../slurm_<NodeName>/... only when the above config option is enabled. Otherwise, it is generated with the form .../slurm/..., which doesn't require the PAM module to be explicitly configured with the nodename=<NodeName> option, which is what you want.

This reasoning is backed up by this code:

https://github.com/SchedMD/slurm/blob/51606c09536df28373dbabe530fed1c7d4d46426/src/slurmd/common/slurmd_cgroup.c#L322

Although something might be escaping to this theory. In the last config copy we have from your site, your node definitions have NodeName != NodeAddr, although that doesn't seem to be enough for slurmd to create the path with .../slurmd_<NodeName>/... format. So I am inclined to suspect you compiled with multiple slurmd option. Would you mind to check it?

You can know whether you compiled with such option by inspecting your config.log in your build directory, like this:

alex@ibiza:~/slurm/17.11/ibiza/slurm$ grep MULTIPLE_SLURMD config.log
| #define MULTIPLE_SLURMD 1
#define MULTIPLE_SLURMD 1
alex@ibiza:~/slurm/17.11/ibiza/slurm$

If it is not enabled grep doesn't report any line. Thanks.
Comment 8 Ben Matthews 2018-07-06 11:35:57 MDT
(In reply to Alejandro Sanchez from comment #7)
> Did you by any chance happen to compile with --enable-multiple-slurmd option?

Looks like yes. I have a coworker who has decided that all the features need to be turned on. I'll go have another talk with him. Thanks for the catch/sorry about the ticket. 

It would be nice if pam_slurm_adopt had a heuristic to handle this, but it's probably enough of an edge case that it might not be worth the effort. 

NodeName is != NodeAddr for other reasons, we're not actually using multiple slurmd for anything.
Comment 9 Alejandro Sanchez 2018-07-09 03:36:01 MDT
I think it is not worth the effort either. Thanks for your feedback. We will change the documentation to clarify the cgroup path formats depending upon that option.