Ticket 12672 - RFE: contain more SPANK entrypoints when using job_container/tmpfs
Summary: RFE: contain more SPANK entrypoints when using job_container/tmpfs
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 22.05.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Tim McMullan
QA Contact: Tim Wickberg
URL:
: 17426 (view as ticket list)
Depends on:
Blocks:
 
Reported: 2021-10-15 08:16 MDT by Luke Yeager
Modified: 2024-04-18 09:13 MDT (History)
7 users (show)

See Also:
Site: NVIDIA (PSLA)
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 24.05.0rc1
Target Release: 24.05
DevPrio: 1 - Paid
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Luke Yeager 2021-10-15 08:16:15 MDT
(In reply to Tim McMullan from https://bugs.schedmd.com/show_bug.cgi?id=12403#c5)
> The changes you describe for spank_user_init(), spank_task_post_fork(), and
> spank_task_exit() I expect would be an enhancement.  We should break those
> desired changes out into an enhancement ticket and chat with Tim (Wickberg)
> et al. about it.
I'm breaking out a request from bug#12403 (docs RFE) into this separate ticket (breaking change RFE).

I'd like to see changes made to the entrypoints marked 'YES' in the table below. I'd like to see those entrypoints contained within the tmpfs mount namespace. As of 21.08.2, they aren't.

Location                      Which /tmp  Want changed?
--------                      ----------  -------------
spank_job_prolog()            OS
Prolog                        OS
spank_init()                  OS
spank_init_post_opt()         OS
spank_user_init()             OS          YES
spank_task_post_fork()        OS          YES
spank_task_init_privileged()  Job
spank_task_init()             Job
TaskProlog                    Job
spank_task_exit()             OS          YES
spank_exit()                  OS
spank_job_epilog()            OS
Epilog                        OS

user_init() and task_exit() are particularly critical. We need that in order for pyxis (https://github.com/NVIDIA/pyxis) to work properly without needing to reconfigure all our paths to avoid /tmp and /dev/shm. The request for the change to post_fork() is mostly just so the table looks logical when documented in chronological order (as is the table given here).
Comment 14 Felip Moll 2023-11-07 09:15:02 MST
*** Ticket 17426 has been marked as a duplicate of this ticket. ***
Comment 42 Tim McMullan 2024-04-10 06:36:21 MDT
Hi Luke,

This has been pushed to the master branch in commits 84eb2c6eb0 through 4ca0f6266b. This functionality turned out to be harder to implement than anticipated, and due to the nature of the changes and how they could potentially break some long standing spank plugins this feature has been gated behind "SlurmdParameters=contain_spank". Without this flag, everything will be have the same as before, but with it spank_user_init(), spank_task_post_fork(), and spank_task_exit() should all run contained.

Thanks!
--Tim
Comment 44 Tim McMullan 2024-04-18 09:13:54 MDT
Since this is pushed now, I'm marking this as resolved.

Thanks!