| Summary: | submit non-root jobs in rootonly partition | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | hpc-admin |
| Component: | Configuration | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 19.05.5 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Ghent | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm config | ||
Created attachment 14791 [details]
slurm config
Thank you for reporting this behavior. It looks like this has not worked for some time. Since this is the first time we have seen this feature mentioned in some time may I ask what your use case for this feature is? Dear Jason, I am working on a kind of BurstBuffer and I would like to have a partition where a robot can submit extra high priority jobs as a normal user. Clearly, we do not want the users to submit regular jobs in this high priority partition, an clearly we do not want jobs running as root in this partition. Unfortunately, when I started to work on this project, I made the rookie error that I actually believed what is in the manual, and I did not check what is advertised there is working (see --uid switch for srun and salloc). For this, another ticket is connected, see: https://bugs.schedmd.com/show_bug.cgi?id=8564 By the way, I still did not get any suggestion how could I solve this problem and the manual was not updated. Thank you for your patience while we discussed this internally. We do not have any plans to add this feature into Slurm since there are security concerns that we have associated with --uid. We manipulate the munge credential and also expose roots environment when we do this. It is more likely that that --uid will be phased out. I will update the documentation to reflect this change. Instead of using --uid we suggest using sudo in the workflow to run the job as a user. If using sudo in the root job is not a good solution then might I suggest a combination of job_submit with AllowAccounts or AllowQos on the partition. The job_submit could look at the alloc_node to allow the job submission or reject it based of the origin. Your robot would then submit the job with sudo on behalf of that user into the queue which would then be allowed since it is being submitted from the authorized location and with the correct account or qos. Resolving. Please re-open if you have further questions. |
I am trying to submit a job in a root only partition but it does not seem to work. Maybe I am doing something wrong with the config? As far as I was able to see in the manuals, this should work: salloc --uid=hajgato --partition=banette_test srun echo "here" but I get: salloc: Job allocation 20000615 has been revoked. salloc: error: Job submit/allocate failed: Requested partition configuration not available now I can submit jobs as root without the uid parameter. [root@master23 ~]# id hajgato uid=2010(hajgato) gid=100(users) groups=100(users),10(wheel) [root@master23 ~]# sacctmgr show user hajgato User Def Acct Admin ---------- ---------- --------- hajgato gvo00002 None [root@master23 ~]# scontrol show partition banette_test PartitionName=banette_test AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=YES MaxNodes=UNLIMITED MaxTime=3-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=node2801.banette.os,node2802.banette.os,node2803.banette.os,node2804.banette.os,node2805.banette.os,node2806.banette.os,node2807.banette.os,node2808.banette.os PriorityJobFactor=1 PriorityTier=1 RootOnly=YES ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=32 TotalNodes=8 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED I have attached the slurm.conf as well.