Ticket 14445

Summary: salloc --container does not invoke command inside of container
Product: Slurm Reporter: yitp.support
Component: ConfigurationAssignee: Nate Rini <nate>
Status: RESOLVED CANNOTREPRODUCE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: nate
Version: 21.08.8   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=14549
Site: Kyoto University Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf

Description yitp.support 2022-06-30 03:36:37 MDT
SchedMD's containers guide, https://slurm.schedmd.com/containers.html, is listing a bunch of --container use cases.

One of salloc example in this list does not run command inside of container, but other examples worked.

Here is my test result. (This is not production system, I am testing on my small cluster). Given, Podman with Rocky linux is running on the compute node, /work/bundle is exported from Ubuntu container. Therefore, /etc/os-release should have Ubuntu information.

$ salloc --container /work/bundle head -n 2 /etc/os-release
salloc: Granted job allocation 609
NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"
salloc: Relinquishing job allocation 609

What configuration am I missing?

My oci.conf is;

RunTimeQuery="runc --rootless=true --root=/tmp/ state %j"
RunTimeRun="runc --rootless=true --root=/tmp/ run %j -b %b"
RunTimeKill="runc --rootless=true --root=/tmp/ kill -a %j"
RunTimeDelete="runc --rootless=true --root=/tmp/ delete --force %j"

By the way, RunTimeCreate does not work due to #12926.
Comment 1 Nate Rini 2022-06-30 09:40:57 MDT
(In reply to yitp.support from comment #0)
> By the way, RunTimeCreate does not work due to #12926.
Please open a new bug for this. Since your site is supported, it will get fixed.
Comment 3 Nate Rini 2022-06-30 09:44:05 MDT
(In reply to yitp.support from comment #0)
> What configuration am I missing?
Please attach your slurm.conf, oci.conf (if there are any more lines than in comment#0) and any Slurm configuration files.
Comment 4 yitp.support 2022-07-05 19:33:15 MDT
Created attachment 25760 [details]
slurm.conf
Comment 5 yitp.support 2022-07-05 19:47:31 MDT
When I don't specify any comment, the invoked shell is running in the container.

$ salloc --container ~/bundle
salloc: Granted job allocation 621
head -n 2 /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
Comment 10 Nate Rini 2022-07-12 16:31:56 MDT
I have been unable to recreate this issue locally. Please restart slurmd with slurm.conf entry:
> SlurmdDebug=debug5

Please then rerun the salloc commands with the command and without. Once done, please attach the logs and revert the logging change.
Comment 11 yitp.support 2022-07-15 01:14:20 MDT
So you cannot reproduce the problem, I should test on production system. When we apply the configuration and this issue happens on the production system also, and then I'll reopen this case.
Thanks for your cooperation.