I'm trying to populate a file in /tmp/ for a job, using either any/all of a SPANK plugin, a prolog, or a taskprolog. From my testing, here is the effect of doing a 'mkdir /tmp/foo' from all of the options available to me: Location Effect -------- ------ Prolog Job /tmp/ TaskProlog Job /tmp/ slurm_spank_init(remote) OS /tmp/ slurm_spank_init_post_opt(remote) OS /tmp/ slurm_spank_task_post_fork(remote) OS /tmp/ slurm_spank_user_init(remote) OS /tmp/ slurm_spank_task_init_privileged(remote) Job /tmp/ slurm_spank_task_init(remote) Job /tmp/ I might prefer to see user_init() contained, too, but other than that I don't see anything obviously _wrong_ about that table. However, it was certainly not obvious to me what the behavior would be before I checked. I don't see any indication about how this works in slurm/spank.h, nor in job_container.conf.html. Would you please document somewhere how this expected to work, or point me to the documentation if I missed it?
To clarify that table, the "Prolog" and "TaskProlog" rows aren't related to spank. I didn't test slurm_spank_job_prolog().
Thanks for the info Luke! I'll make sure this all gets documented!
Sure! I'd try verifying that you get the same results, too, before making it official. I'm currently getting different behavior on my local machine (with --enable-multiple-slurmd) - the Prolog is running w/ the OS's /tmp/ instead.
Here's what I'm seeing with 21.08.2 (so, my original table was wrong about the prolog): Location Which /tmp Want changed? -------- ---------- ------------- spank_job_prolog() OS Prolog OS spank_init() OS spank_init_post_opt() OS spank_user_init() OS YES spank_task_post_fork() OS YES spank_task_init_privileged() Job spank_task_init() Job TaskProlog Job spank_task_exit() OS YES spank_exit() OS spank_job_epilog() OS Epilog OS 1) What's the timeline on documenting this somewhere? 2) I'd like to see more of the SPANK entrypoints contained, as specified in the table above. In particular, user_init is sometimes preferable to task_init because it only runs once per node instead of once per task, but with 'job_container/tmpfs' you're stuck with task_init. Also, it's hard for plugins to clean up after themselves in task_exit from work they did in task_init when the /tmp mount has changed.
Hey Luke, sorry about the delay! (In reply to Luke Yeager from comment #4) > 1) What's the timeline on documenting this somewhere? I've been looking into some inconsistencies with job_container/tmpfs and what is contained/not so I've been holding off on documenting it until I get that sorted out. > 2) I'd like to see more of the SPANK entrypoints contained, as specified in > the table above. In particular, user_init is sometimes preferable to > task_init because it only runs once per node instead of once per task, but > with 'job_container/tmpfs' you're stuck with task_init. Also, it's hard for > plugins to clean up after themselves in task_exit from work they did in > task_init when the /tmp mount has changed. The changes you describe for spank_user_init(), spank_task_post_fork(), and spank_task_exit() I expect would be an enhancement. We should break those desired changes out into an enhancement ticket and chat with Tim (Wickberg) et al. about it. Thanks, and sorry again about the delay! --Tim
(In reply to Tim McMullan from comment #5) > The changes you describe for spank_user_init(), spank_task_post_fork(), and > spank_task_exit() I expect would be an enhancement. We should break those > desired changes out into an enhancement ticket and chat with Tim (Wickberg) > et al. about it. Roger that. Bug#12672.
Hey Luke, Sorry about the delay on this, but the documentation for this landed in https://github.com/SchedMD/slurm/commit/b4893df64 I'll resolve this now since the docs landed. Thanks! --Tim