| Summary: | job_container/tmpfs + init.sh/prolog.sh | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Manuel Holtgrewe <manuel.holtgrewe> |
| Component: | Configuration | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | bas.vandervlies, cinek, felip.moll, lyeager |
| Version: | 21.08.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: |
https://bugs.schedmd.com/show_bug.cgi?id=7477 https://bugs.schedmd.com/show_bug.cgi?id=13242 https://bugs.schedmd.com/show_bug.cgi?id=13546 |
||
| Site: | Berlin Institute of Health | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 22.05pre1 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
|
Description
Manuel Holtgrewe
2021-12-27 06:21:56 MST
Thanks for the pointer to 'see also'. I was already able to implement the GRES. I also know how to set an XFS quota per project. What I would need is a robust way to know the requested GRES of the job and the path to the /tmp directory from job_container/tmpfs. To my understanding, having SLURM_JOBID in the job_container/tmpfs InitScript would be sufficient as I assume that this is called after setting up the namesapce and bind mount. Maybe passing the SLURM_* environment variables into this script would be sufficient? Manuel, I see your point, however, just adding JobId may not be optimal. I guess you were thinking about the internal use of tools like `scontrol show job JOBID` to get the required information. Such an approach will generate additional REQUEST_JOB_INFO RPC per every job start (potentially from every node), this in fact may have a severe impact on scheduler performance (especially in HTC environment). We'll have an internal discussion on how to approach that. I'll keep you posted. cheers, Marcin Hi, is there any news on this? Manuel, Sorry for the delay. We have a patch under review. We decided to add some basic environment variables (like SLURM_JOB_ID) to the script. The change is targeted to Slurm 22.05, but should be easy to backport locally. cheers, Marcin Manuel, We've merged a basic environment setup for InitScript of job_container/tmpfs. This is in the master branch[1] and will be part of Slurm 22.05 release. We're looking into further improvements in this area, since calling `scontrol show job` in InitScript will result in a high load on slurmctld side limiting the system throughput. Those improvements (providing more information to InitScript) require a more complicated rewrite, so I can't commit to anything more in Slurm 22.05 at the moment. Is there anything else I can help you with in this bug report? cheers, Marcin [1]https://github.com/SchedMD/slurm/commit/e25270e53f57be9aae48759ea5fdd57c9f7eb6b6 Hi Marcin, thanks a lot for this already! I'll have a look whether we can bear the additional RPC pressure. Best wishes, Manuel |