Created attachment 21498 [details] tar file with patch and sample pre and post namespace clone scripts. In working with the tmp_fs job_container plugin in , we discovered that in a envionment with auto-mounted home and/or project directories for the users . these would not be accessible for the job . working with ticket 12361 , i was able to come up with a patch to run a script after the job name space had been created as well as one prior to removing the /dev/shm and /tmp directies durring tear down of the container. I am attempting to run a per job automounter instance with the fifo , map directories compiled to run in /dev/shm %define fifo_dir /dev/shm %configure --disable-mount-locking --enable-ignore-busy --with-libtirpc --without-hesiod %{?systemd_configure_arg:} --with-confdir=%{_sysconfdir} --with-mapdir=%{fifo_dir} --with-fifodir=%{fifo_dir} --with-flagdir=%{fifo_dir} --enable-force-shutdown I have it working with one remaining issue, the run_command function waits for the automount'er to complete before it returns . And i need to leave the automounter running in the users job namespace until the completion of job, i need to background it. is there i similar command that can be used to lauch a script to setup the users mounts . I am unable to make use of the job_container functionality as we like many folks make use of automounted nfs project directories for our users
Reference original https://bugs.schedmd.com/show_bug.cgi?id=12361 ticket
Created attachment 21515 [details] updated patch to add pre and post clonens scripts I tried changing the timeout value for the run_command to -1 for the clonensscript , this seems to have got it working much better.. still a work in progress
Thanks, Mike. I will have one of our engineers look over this and give you some feedback.
Created attachment 21603 [details] sync version of patch to exec a script to start and later shutdown a automounter Sync version of patch which executes the nsclone script and waits for it to exit with out killing the process group underneath .. modified and "shamelessly" taken from the slurmctld deamon code. This would be to make sure the automounter comes up fully prior to continuing with the job launch. of note here even with a this set , i still see a warning about executing your job in /tmp instead of the home directory . but it in fact does execute the correctly in the users auto mounted home directory etc.. the check must be happening prior to or not calling the container entry code ....
Created attachment 21893 [details] updated sync version for 21.08.2+ i noticed that in 21.08.2 that additional calls were now being made through to the _delete_ns functions , so i moved my additional calls down into the _create_ns and _delete_ns functions . i probably should have left them there in the first place.
Hey Mike, I've been looking this over and I'm a little confused why for clonensscript you have run_command commented out and use your own fork/waitpid setup, but for clonensepilog run_command is fine. Was there some specific issue you were avoiding there that I'm missing? Thanks! --Tim
(In reply to Tim McMullan from comment #10) > Hey Mike, > > I've been looking this over and I'm a little confused why for clonensscript > you have run_command commented out and use your own fork/waitpid setup, but > for clonensepilog run_command is fine. Was there some specific issue you > were avoiding there that I'm missing? > > Thanks! > --Tim yes , i just took the run_script from i think it was one of the slurmd main's what the run_command was doing is fireing up the script async and i was concerned about waiting for the automounter to get up and running before the mount was actually done. i was wanting it to wait until the file systems were in a good state, bind mounts were in place etc.. then proceed with the job launch Mike
there may be a way to get the run_command to act sync but i just did not know how to do that.. on the cleanup i was not as worried about running in a async mode so i just used the regular run_command but the sync version should have been fine as well Mike
(In reply to mike coyne from comment #12) > there may be a way to get the run_command to act sync but i just did not > know how to do that.. on the cleanup i was not as worried about running in a > async mode so i just used the regular run_command but the sync version > should have been fine as well > Mike Ok, thanks for the clarification there! run_command is only async if you call it with max_wait=-1, otherwise it should be able to replace for the fork()+waitpid() method (there are some small improvements still happening with run_command).
Hey Mike, I was wondering if we could make this ticket public? We have a second ticket discussing this issue as well and I think it would be ideal to have the discussion in one spot. I can mark any attachments you would like as private to just SchedMD, if there are any attachments you are concerned about. Let me know either way! Thanks, --Tim
I personally do not have a problem with opening this but i do need to get it Officially Reviewed for release to the public. So please bear with me ..
(In reply to mike coyne from comment #20) > I personally do not have a problem with opening this but i do need to get it > Officially Reviewed for release to the public. So please bear with me .. Thank you Mike, just let me know how it goes!
the data was reviewed and is ok for to be made public . i "unchecked" all the only users in the selected boxes , if that helps
(In reply to mike coyne from comment #25) > the data was reviewed and is ok for to be made public . i "unchecked" all > the only users in the selected boxes , if that helps Thank you!
*** Ticket 14344 has been marked as a duplicate of this ticket. ***
FYI: As an alternative to the job_container/tmpfs plugin, I've now built and tested the SPANK plugin https://github.com/University-of-Delaware-IT-RCI/auto_tmpdir. Please see https://bugs.schedmd.com/show_bug.cgi?id=14344#c16 I hope that the excellent work in that plugin can help solving the autofs issue in the job_container/tmpfs plugin.
Hi, is there any timeline for a patch release? Best, Stefan
Created attachment 25937 [details] Updated patch to work with slurm 22.05.2 i updated my patch to add a pre and post namespace script for 22.05.2 i did leave the previous _run_script_in_ns i had copied. as i was not sure how to launch a script with time out but leave a running "service" such as automount with the updated run_script functions in the 22.05 release
Hi Mike, (In reply to mike coyne from comment #30) > i updated my patch to add a pre and post namespace script for 22.05.2 > i did leave the previous _run_script_in_ns i had copied. as i was not sure > how to launch a script with time out but leave a running "service" such as > automount with the updated run_script functions in the 22.05 release Could you kindly explain what are the ramifications of the updated patch? Will the autofs filesystems work with this patch? Will the patch be included in 22.05.3? Thanks, Ole
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #31) > Hi Mike, > > (In reply to mike coyne from comment #30) > > i updated my patch to add a pre and post namespace script for 22.05.2 > > i did leave the previous _run_script_in_ns i had copied. as i was not sure > > how to launch a script with time out but leave a running "service" such as > > automount with the updated run_script functions in the 22.05 release > > Could you kindly explain what are the ramifications of the updated patch? > Will the autofs filesystems work with this patch? Will the patch be > included in 22.05.3? > > Thanks, > Ole Ole , i have been working with tim on a possible solution , my intent has been to demonstrate a way to get autofs working in along with the tmpfs mnt namespace. My intent is just to demonstrate the issue and show a possible solution . as far as the patch i just added .. i have been trying to get the version i had working for 21.08 to now work on 22.05 but schemd has re-written the run_command family of function. i believe i have the initiation script working the the but the clonensepilog script in the _delete_ns call may not be quite right it seems to fire but is immediately terminated , i had set its type as initscript but it seems to need to be something else. hoping tim can help me with that. So that said. what this patch will do is fire a up root executed shell just after the name space is created , and another one just prior the name space getting destroyed . What this allows me to do is start up a purpose built automount with in the name space , to make automount work , what i had to do was redirect the directory it uses for its fifos and configs into the namespace ie /dev/shm or /tmp .. if you like .. this allows the automounter to run and not step on other automounters in other namespaces .. and the epilog script is intended to shutdown the automounter with a sig 15 to allow it to shutdown the automounter cleanly .. on exit . What is did find was that i was able to really tune users environment , scratch access etc much easier as we control what file systems they can access based on a predefined access "color" if you will, per job . regretfully what i have seen is that there are things that can happen that force the job to not exit thought the epilog _destroy_ns .. so it may be a good to have your node health check kill any automount process not associated with a running job and unmount any leftover mounts if any . any feedback , thoughts or suggestions would be very helpful Mike
Hi Mike, (In reply to mike coyne from comment #32) > i have been working with tim on a possible solution , my intent has been > to demonstrate a way to get autofs working in along with the tmpfs mnt > namespace. My intent is just to demonstrate the issue and show a possible > solution . I understand that this patch is Work in Progress and seems to be somewhat involved. I don't have experience with this type of software, so I can't offer any help. On 21.08.8 we have the auto_tmpdir[1] SPANK plugin working very nicely indeed together with our autofs NFS automount home directories. FWIW, the implementation in auto_tmpdir[1] may perhaps be a Proof of Concept to inspire the tmp_fs job_container plugin? Maybe you can test auto_tmpdir[1] in your environment? Since you're having issues on 22.05, I wonder if auto_tmpdir[1] will face such issues as well? Thanks, Ole [1] https://github.com/University-of-Delaware-IT-RCI/auto_tmpdir
Created attachment 26032 [details] Updated slurm 22.05.2 using the same internal script execution for nsepilog and nsprolog Tim, Ole .. I corrected my patch for 22.05.2 , to make it run-able i replaced the run_command with my _run_script_in_ns . when trying to use run_command it would not execute as it complained that it was in the process of shutting down and refused to execute the script in the _delete_ns . i assume i am not calling it correctly . Tim, another question is the combination of using the tmp_fs and the --container options for srun .. Is the execution of the "runc" withing the name space of the tmp_fs or is it in parallel with that namespace . I can seem to run one or the other but not both.. it may be the adding the automounter in the tmp_fs is breaking some timing ? Ole, thanks, i will take a look at your spank plugin . Mike
I'm out of the office, back on August 15. Jeg er ikke på kontoret, tilbage den 15/8. Best regards / Venlig hilsen, Ole Holm Nielsen
(In reply to mike coyne from comment #34) > Created attachment 26032 [details] > Updated slurm 22.05.2 using the same internal script execution for nsepilog > and nsprolog > > Tim, Ole .. > I corrected my patch for 22.05.2 , to make it run-able i replaced the > run_command with my _run_script_in_ns . when trying to use run_command it > would not execute as it complained that it was in the process of shutting > down and refused to execute the script in the _delete_ns . i assume i am not > calling it correctly . That is quite interesting, but at the moment I'm really trying to explore mount propagation between / and the job namespaces since this should be a much cleaner approach. Its also the approach some of the spank plugins use. I do have a proof of concept that seems to work, but it needs a lot more testing and refinement. > Tim, another question is the combination of using the tmp_fs and the > --container options for srun .. Is the execution of the "runc" withing the > name space of the tmp_fs or is it in parallel with that namespace . I can > seem to run one or the other but not both.. it may be the adding the > automounter in the tmp_fs is breaking some timing ? The --container flag and the job_container/tmpfs plugin do a lot of things very close to eachother in the code. I've not experimented with combining them yet so I'm not sure what is going on there. Have you tried it without your additional patches? > Ole, thanks, i will take a look at your spank plugin . > Mike
(In reply to Tim McMullan from comment #37) > (In reply to mike coyne from comment #34) > > Created attachment 26032 [details] > > Updated slurm 22.05.2 using the same internal script execution for nsepilog > > and nsprolog > > > > Tim, Ole .. > > I corrected my patch for 22.05.2 , to make it run-able i replaced the > > run_command with my _run_script_in_ns . when trying to use run_command it > > would not execute as it complained that it was in the process of shutting > > down and refused to execute the script in the _delete_ns . i assume i am not > > calling it correctly . > > That is quite interesting, but at the moment I'm really trying to explore > mount propagation between / and the job namespaces since this should be a > much cleaner approach. Its also the approach some of the spank plugins use. > I do have a proof of concept that seems to work, but it needs a lot more > testing and refinement. > > > Tim, another question is the combination of using the tmp_fs and the > > --container options for srun .. Is the execution of the "runc" withing the > > name space of the tmp_fs or is it in parallel with that namespace . I can > > seem to run one or the other but not both.. it may be the adding the > > automounter in the tmp_fs is breaking some timing ? > > The --container flag and the job_container/tmpfs plugin do a lot of things > very close to each other in the code. I've not experimented with combining > them yet so I'm not sure what is going on there. Have you tried it without > your additional patches? > > > Ole, thanks, i will take a look at your spank plugin . > > Mike Tim, this what i have seen combining them so far. i do now have my patched tmp_fs code in place and working .. when i was testing i allowed the root namespace autofs to be disconnected in that say /users on the node as root no longer mounted users .. and with the job running with the tmp_fs enables the automounter correctly runs and allow users to access their home directories , project file etc.. What i saw when trying to run a --container job ; i found that it seems to not launch in my case runc as the user withing the tmp_fs but instead launches the container (srun with in a salloc ) in a "parallel" mount name space. The path to say the container had to be a viable path in both the root namespace and the tmp_fs namespace . otherwise it would fail to find the container path... i did try to wrap the runc c command in a nsexec call but as its run as the user so that did on work out. i was able to run full parallel containers with the tmp_fs enabled if i put the container file system in a location that was availibe on the root fs and and had been imported into the tmp_fs Mike running either by on their works fine..
*** Ticket 14803 has been marked as a duplicate of this ticket. ***
*** Ticket 14954 has been marked as a duplicate of this ticket. ***
Hi everyone, We've landed these commits that should update the job_container/tmpfs plugin to function with autofs managed mounts. It is following a similar pattern to that of many of the spank plugins in that its sharing mounts from the root namespace into the job containers as they are mounted. https://github.com/SchedMD/slurm/commit/516c10ce06 https://github.com/SchedMD/slurm/commit/5f67e6c801 https://github.com/SchedMD/slurm/commit/2169071395 Please let us know if you encounter any issues with this! Thanks! --Tim
Since this has landed now, I'm going to mark this as resolved. Let us know if you have any issues! Thanks, --Tim
Thanks Tim , i was working on trying to get the patches in place ... i am a little uncertain about going from a configuration with MS_PRIVATE -> MS_SHARED & MS_SLAVE and how that will relate to any root generated mounts in the job ns and if they could show up in other namespaces ? i created new prejobprivns job_container in my 22.05.6, still with ms_private copy to compare with the shared/slave version . I may need to continue with using the per job autofs and namespaces as being able to customize the mount name space per job has proved very useful for particular reasons when it comes to sharing a clusters computes between multiple mutually isolated programs. Mike On 12/21/22 11:25, bugs@schedmd.com wrote: > https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=12567__;!!Bt8fGhp8LhKGRg!FZPULSmML-W0m4VQliPXi9pbq12QX6tFaD5slXLFMt6nzrch0pqyYgiMF1kvWD5HC5WQEMesLXsz$ > > Tim McMullan <mcmullan@schedmd.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Version Fixed| |23.02pre1 > Status|OPEN |RESOLVED > Resolution|--- |FIXED > > --- Comment #48 from Tim McMullan <mcmullan@schedmd.com> --- > Since this has landed now, I'm going to mark this as resolved. > > Let us know if you have any issues! > > Thanks, > --Tim >