I've implemented the job_container/tmpfs functionality to redirect node /tmp space to a large NFS fileserver to avoid limitations on local disk space. It appears that this functionality, in conjunction with autofs, causes the first attempt to run a job to fail on a machine where a required NFS directory has not yet been mounted. The symptom is the following in the application's log file: /var/spool/slurmd/job32283/slurm_script: line 3: /apps/cst/cluster/CST2022/cst_settings: Too many levels of symbolic links /var/spool/slurmd/job32283/slurm_script: line 4: /apps/cst/cluster/CST2022/cst_common: Too many levels of symbolic links This came from an exec node where the /apps/cst automount filesystem was not yet mounted when the job started. The tmpfs namespace is added successfully, but it runs into problems when a directory that did not exist when the namespace was created is accessed. Subsequent runs on the same node work fine, because by the time the second run starts the automounter has finished mounting the filesystem. The root cause appears to be that the automount daemon is not aware of namespaces, any new automounts go into the parent namespace by default, making them inaccessible to the job's namespace. It looks like it's necessary to walk through the list of automounts on the system and mark each of them shared-subtree, by setting the MS_SHARED flag. The shell command required is mount --make-shared /apps, for example. This allows changes in the parent namespace's automount points to be seen by the child. My first workaround attempt is limited to a specific app, but I'll see if I can redesign that prolog script to be universally applicable, and post it here if it works.
I tried the following prolog script to no avail, so maybe my guess is wrong: #!/bin/bash for mountpoint in $(mount -l -t autofs | awk '{print $3}') ; do /bin/mount --make-rshared $mountpoint done I also tried adding an "ls -l $mountpoint/* >/dev/null 2>&1" with the idea of prodding the automounter, but that wound up with a "launch failed requeued held" problem.
This appears to actually be a known issue (bug12567) that I've been working on getting fixed for an upcoming Slurm release. (In reply to Michael Pelletier from comment #1) > I tried the following prolog script to no avail, so maybe my guess is wrong: > > #!/bin/bash > for mountpoint in $(mount -l -t autofs | awk '{print $3}') ; do > /bin/mount --make-rshared $mountpoint > done This would seem to help, but part of the current implementation actually forces everything to private after the prolog script runs. > I also tried adding an "ls -l $mountpoint/* >/dev/null 2>&1" with the idea > of prodding the automounter, but that wound up with a "launch failed > requeued held" problem. In an empty autofs mount point the ls -l for me just errors out which is probably why it fails. It would have to know all the mount points first and touch them all. I'm not currently aware of an easy workaround for this inside slurm, but there is some example code and some suggestions in the other bug. I do have a proof of concept fix that still requires more testing but it will likely be in a future major release. Let me know if you have other questions on this, but I will likely mark this as a duplicate of 12567. Thanks! --Tim
As mentioned in the previous comment, this is a dup of 12567. Marking it as a duplicate now. Thanks! --Tim *** This ticket has been marked as a duplicate of ticket 12567 ***
Thanks very much for your guidance, Tim! I'll take a closer look at bug 12567 and decide whether I have to revert from the container to the TmpFS= approach.