Ticket 11673 - /tmp from job_container/tmpfs not usable after slurmd restart
Summary: /tmp from job_container/tmpfs not usable after slurmd restart
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 21.08.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-05-20 16:46 MDT by Felix Abecassis
Modified: 2021-06-08 08:25 MDT (History)
3 users (show)

See Also:
Site: NVIDIA (PSLA)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 21.08pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Felix Abecassis 2021-05-20 16:46:11 MDT
Tested on the current master branch, commit a10619bf1d482e189fc3f0dceed5ef459b410667
Running Slurm in a single node test config, on Ubuntu 20.04 with kernel 5.4.0-73-generic.

$ cat /etc/slurm/job_container.conf
BasePath=/var/run/slurm
AutoBasePath=true


When starting a new job, it has access to a per-job filesystem mounted in /tmp:
$ srun --pty bash
$ findmnt /tmp
TARGET SOURCE             FSTYPE OPTIONS
/tmp   tmpfs[/slurm/8/.8] tmpfs  rw,nosuid,nodev,noexec,relatime,size=32594660k,mode=755

And from this job you can write to /tmp:
$ echo $(hostname) > /tmp/test ; cat /tmp/test
ioctl

From another terminal running in the root mount namespace, you can indeed see that there is a mount for this filesystem:
$ findmnt -R /run/slurm  
TARGET             SOURCE                 FSTYPE OPTIONS
/run/slurm         tmpfs[/slurm]          tmpfs  rw,nosuid,nodev,noexec,relatime,size=32594660k,mode=755
└─/run/slurm/8/.ns nsfs[mnt:[4026532561]] nsfs   rw


Now, if you stop slurmd normally while the job is still running, /run/slurm will be unmounted:
$ findmnt -R /run/slurm ; echo $?
1

And then, restarting slurmd will create new mounts, but with a different mount namespace (4026532561 vs 4026532562):
$ findmnt -R /run/slurm
TARGET             SOURCE                 FSTYPE OPTIONS
/run/slurm         tmpfs[/slurm]          tmpfs  rw,nosuid,nodev,noexec,relatime,size=32594660k,mode=755
└─/run/slurm/8/.ns nsfs[mnt:[4026532562]] nsfs   rw


As a result, /tmp is not usable from the existing job anymore:
$ ls /tmp
ls: cannot open directory '/tmp': Permission denied
Comment 1 Felix Abecassis 2021-05-20 17:03:56 MDT
Actually, I don't think it's related to the mount namespace (but it's likely another separate bug).

I think the issue is this line hardcoding UID 0:
https://github.com/SchedMD/slurm/blob/a10619bf1d482e189fc3f0dceed5ef459b410667/src/plugins/job_container/tmpfs/job_container_tmpfs.c#L172
This will cause the permissions of the job's /tmp folder to change underneath it, here: 
https://github.com/SchedMD/slurm/blob/a10619bf1d482e189fc3f0dceed5ef459b410667/src/plugins/job_container/tmpfs/job_container_tmpfs.c#L550


From the job, before the slurmd restart:
$ ls -ld /tmp/
drwx------ 2 fabecassis root 60 May 20 16:01 /tmp/


After the slurmd restart
$ ls -ld /tmp/
drwx------ 2 root root 60 May 20 16:01 /tmp/
Comment 3 Jake Rundall 2021-05-21 11:25:13 MDT
This seems related to this other bug I reported, where the job's /tmp remains owned by root in some cases (basically until srun is run):
https://bugs.schedmd.com/show_bug.cgi?id=11609
Comment 4 Tim McMullan 2021-05-21 11:34:15 MDT
(In reply to Felix Abecassis from comment #1)
> Actually, I don't think it's related to the mount namespace (but it's likely
> another separate bug).

You are correct, it appears that the /tmp ending up owned by root after the restart and the issue with /dev/shm appear to be different problems.  Its related too....

(In reply to Jake Rundall from comment #3)
> This seems related to this other bug I reported, where the job's /tmp
> remains owned by root in some cases (basically until srun is run):
> https://bugs.schedmd.com/show_bug.cgi?id=11609

the eventual fix for 11609.  I'll give more details on that particular bug there in a moment.

Short version is that I've reproduced this issue on master and written a patch for it, its just waiting review now.

Thanks!
--Tim
Comment 7 Tim McMullan 2021-06-08 08:25:29 MDT
This issue has been resolved on master (https://github.com/SchedMD/slurm/commit/77eb6cbd2397c3bcb7b3007080942db291c6d467).

Thanks for catching this!
--Tim