Ticket 5380 - pam_slurm_adopt, two jobs on a single node, cgroup path
Summary: pam_slurm_adopt, two jobs on a single node, cgroup path
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 17.11.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Alejandro Sanchez
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2018-07-03 14:38 MDT by Ben Matthews
Modified: 2018-07-09 15:22 MDT (History)
2 users (show)

See Also:
Site: UCAR
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ben Matthews 2018-07-03 14:38:26 MDT
By default, pam_slurm_adopt is supposed to look at the memory cgroup's timestamps to determine which job to adopt an out-of-band process into. At some point this stopped working on our system, and it seems to be because it is looking at the wrong place in the cgroup hierarchy. If I have two jobs on a node, and I then try to ssh into that node, I see this in the log:

2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: From 128.117.8.161 port 59672 as matthews: unable to determine source job
2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: Couldn't stat path '/sys/fs/cgroup/memory/slurm/uid_24712/job_837188'
2018-07-03T14:22:19-06:00 caldera14 pam_slurm_adopt[18411]: Couldn't stat path '/sys/fs/cgroup/memory/slurm/uid_24712/job_837185'

But, the correct path would seem to be:

/sys/fs/cgroup/memory/slurm_caldera14/uid_24712/job_837185/

It looks like there is some code for this in the latest pam_slurm_adopt, so I'm not completely sure why it's not working. Is there some extra config we need in 17.11.5?
Comment 1 Alejandro Sanchez 2018-07-04 03:45:15 MDT
Hi Ben,

looking at the code, I see this:

        if (opts.node_name)
                cgroup_suffix = xstrdup_printf("_%s", opts.node_name);

        if (snprintf(uidcg, PATH_MAX, "%s/memory/slurm%s/uid_%u",
                     slurm_cgroup_conf->cgroup_mountpoint, cgroup_suffix, uid)
            >= PATH_MAX)

in the _indeterminate_multiple() function. Meaning if nodename option is specified in the PAM module, then .../slurm%s/... will be substituted with such name to generate the final path. After that there's a call _cgroup_creation_time() to end constructing the path string with the %s/job_%u information. Could you try to add the "nodename" option to the module? Something like this:

account    sufficient    pam_slurm_adopt.so nodename=caldera14

and see if it works?

I'm not sure why the nodename option isn't mentioned in the guide

https://slurm.schedmd.com/pam_slurm_adopt.html

Will discuss that internally and come back to you. Thanks.
Comment 4 Ben Matthews 2018-07-05 11:10:03 MDT
Yup, that seems to work. I saw that code as well, but assumed that it should be able to figure out the nodename (unless people were doing something odd with NodeAddr) and that I was missing something. 

It be nice not to have to customize the pam config for every node
Comment 7 Alejandro Sanchez 2018-07-06 05:19:23 MDT
Did you by any chance happen to compile with --enable-multiple-slurmd option?

After a few local tests and inspecting the code, I _suspect_ slurmd daemon generates the cgroup path in the form of .../slurm_<NodeName>/... only when the above config option is enabled. Otherwise, it is generated with the form .../slurm/..., which doesn't require the PAM module to be explicitly configured with the nodename=<NodeName> option, which is what you want.

This reasoning is backed up by this code:

https://github.com/SchedMD/slurm/blob/51606c09536df28373dbabe530fed1c7d4d46426/src/slurmd/common/slurmd_cgroup.c#L322

Although something might be escaping to this theory. In the last config copy we have from your site, your node definitions have NodeName != NodeAddr, although that doesn't seem to be enough for slurmd to create the path with .../slurmd_<NodeName>/... format. So I am inclined to suspect you compiled with multiple slurmd option. Would you mind to check it?

You can know whether you compiled with such option by inspecting your config.log in your build directory, like this:

alex@ibiza:~/slurm/17.11/ibiza/slurm$ grep MULTIPLE_SLURMD config.log
| #define MULTIPLE_SLURMD 1
#define MULTIPLE_SLURMD 1
alex@ibiza:~/slurm/17.11/ibiza/slurm$

If it is not enabled grep doesn't report any line. Thanks.
Comment 8 Ben Matthews 2018-07-06 11:35:57 MDT
(In reply to Alejandro Sanchez from comment #7)
> Did you by any chance happen to compile with --enable-multiple-slurmd option?

Looks like yes. I have a coworker who has decided that all the features need to be turned on. I'll go have another talk with him. Thanks for the catch/sorry about the ticket. 

It would be nice if pam_slurm_adopt had a heuristic to handle this, but it's probably enough of an edge case that it might not be worth the effort. 

NodeName is != NodeAddr for other reasons, we're not actually using multiple slurmd for anything.
Comment 9 Alejandro Sanchez 2018-07-09 03:36:01 MDT
I think it is not worth the effort either. Thanks for your feedback. We will change the documentation to clarify the cgroup path formats depending upon that option.