Ticket 12672

Summary: RFE: contain more SPANK entrypoints when using job_container/tmpfs
Product: Slurm Reporter: Luke Yeager <lyeager>
Component: OtherAssignee: Tim McMullan <mcmullan>
Status: RESOLVED FIXED QA Contact: Tim Wickberg <tim>
Severity: 5 - Enhancement    
Priority: --- CC: albert.gil, dwightman, fabecassis, felip.moll, jbernauer, jblomqvist, marshall
Version: 22.05.x   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=17426
https://bugs.schedmd.com/show_bug.cgi?id=16126
Site: NVIDIA (PSLA) Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 24.05.0rc1 Target Release: 24.05
DevPrio: 1 - Paid Emory-Cloud Sites: ---

Description Luke Yeager 2021-10-15 08:16:15 MDT
(In reply to Tim McMullan from https://bugs.schedmd.com/show_bug.cgi?id=12403#c5)
> The changes you describe for spank_user_init(), spank_task_post_fork(), and
> spank_task_exit() I expect would be an enhancement.  We should break those
> desired changes out into an enhancement ticket and chat with Tim (Wickberg)
> et al. about it.
I'm breaking out a request from bug#12403 (docs RFE) into this separate ticket (breaking change RFE).

I'd like to see changes made to the entrypoints marked 'YES' in the table below. I'd like to see those entrypoints contained within the tmpfs mount namespace. As of 21.08.2, they aren't.

Location                      Which /tmp  Want changed?
--------                      ----------  -------------
spank_job_prolog()            OS
Prolog                        OS
spank_init()                  OS
spank_init_post_opt()         OS
spank_user_init()             OS          YES
spank_task_post_fork()        OS          YES
spank_task_init_privileged()  Job
spank_task_init()             Job
TaskProlog                    Job
spank_task_exit()             OS          YES
spank_exit()                  OS
spank_job_epilog()            OS
Epilog                        OS

user_init() and task_exit() are particularly critical. We need that in order for pyxis (https://github.com/NVIDIA/pyxis) to work properly without needing to reconfigure all our paths to avoid /tmp and /dev/shm. The request for the change to post_fork() is mostly just so the table looks logical when documented in chronological order (as is the table given here).
Comment 14 Felip Moll 2023-11-07 09:15:02 MST
*** Ticket 17426 has been marked as a duplicate of this ticket. ***
Comment 42 Tim McMullan 2024-04-10 06:36:21 MDT
Hi Luke,

This has been pushed to the master branch in commits 84eb2c6eb0 through 4ca0f6266b. This functionality turned out to be harder to implement than anticipated, and due to the nature of the changes and how they could potentially break some long standing spank plugins this feature has been gated behind "SlurmdParameters=contain_spank". Without this flag, everything will be have the same as before, but with it spank_user_init(), spank_task_post_fork(), and spank_task_exit() should all run contained.

Thanks!
--Tim
Comment 44 Tim McMullan 2024-04-18 09:13:54 MDT
Since this is pushed now, I'm marking this as resolved.

Thanks!