Ticket 16371

Summary: reservation for group root
Product: Slurm Reporter: Yann <yann.sagon>
Component: reservationsAssignee: Carlos Tripiana Montes <tripiana>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 22.05.8   
Hardware: Linux   
OS: Linux   
Site: Université de Genève Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 23.02.3, 23.11.0rc1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Yann 2023-03-27 05:57:54 MDT
Dear team,

we are creating a reservation and we allow the group "root". Then it seems it isn't possible to submit as root a job to this reservation.

(baobab)-[root@gpu044 ~]$ scontrol show res
ReservationName=installation_gpu044 StartTime=2023-03-27T13:41:46 EndTime=2023-03-27T19:41:46 Duration=06:00:00
   Nodes=gpu044 NodeCnt=1 CoreCnt=128 Features=(null) PartitionName=(null) Flags=MAINT,OVERLAP,IGNORE_JOBS,SPEC_NODES
   TRES=cpu=128
   Users=(null) Groups=root,hpc_admin Accounts=(null) Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

(baobab)-[root@gpu044 ~]$ srun --reservation=installation_gpu044 --partition=shared-gpu hostname
srun: error: Unable to allocate resources: Access denied to requested reservation
Comment 1 Benny Hedayati 2023-03-27 07:50:07 MDT
Hi,

I have tested and got it to work if I set user=root when creating the reservation instead of group.  What command are you using to create the reservation?

Thanks
Comment 2 Yann 2023-03-27 08:04:37 MDT
Hi, yes I know it is working with user=root, but.. we want to use groups=root,hpc_admin because we don't want to enumerate all the users in this group. Is there an issue to do so?
Comment 3 Benny Hedayati 2023-03-27 08:57:16 MDT
Can you please provide the exact command you have used to create the reservation?  I am getting an error on my end when trying to create one with group=root.

Thanks
Comment 4 Yann 2023-03-27 08:59:33 MDT
(baobab)-[root@admin1 ~]$ NODE=cpu001
(baobab)-[root@admin1 ~]$ scontrol create \
>     Reservation="installation_${NODE}" \
>     StartTime=now \
>     Duration=0-06:00:00 \
>     Groups=root,hpc_admin \
>     Flags=MAINT,IGNORE_JOBS,OVERLAP \
>     Nodes=${NODE}
Reservation created: installation_cpu001
Comment 5 Yann 2023-03-27 09:00:39 MDT
Maybe a typo in your end: it is groupS, not group.
Comment 6 Benny Hedayati 2023-03-27 09:52:17 MDT
Thanks, let me run some tests and get back to you shortly.

Thanks
Comment 7 Benny Hedayati 2023-03-30 16:24:19 MDT
Hi,

Thanks for your patience.  After running a few tests, we beleive the behavior you are experiencing is a bug and I am looking into a resolution at the moment.  I will leave this ticket until we have a solution and will keep you posted on updates.
Comment 8 Benny Hedayati 2023-04-13 12:12:15 MDT
Hello,

Thanks again for your patience.  I have been working on this problem since we last spoke and discovered that the following command works:

$ scontrol create reservation account=root starttime=now duration=infinite nodes=z1
Reservation created: root_19

and attempting to run a job as user root under this reservation works as well:

root@benny-ThinkPad-T14-Gen-3:/home/benny# sbatch --reservation=root_19 --wrap="sleep 2m"
Submitted batch job 320
root@benny-ThinkPad-T14-Gen-3:/home/benny# squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               320     debug     wrap     root  R       0:02      1 z1


Would this be a possible solution for you?  

To view your current account associations you can perform the following command:

$ sacctmgr show assoc

Please let me know if this can work?

Thanks
Comment 9 Yann 2023-04-13 12:12:29 MDT
Hello,

I'm currently in vacation until 17th of April 2023.

If needed you can contact my colleagues from HPC team at hpc@unige.ch or open a ticket on dw.unige.ch

Best wishes

[Logo UNIGE]    Yann Sagon
Référent HPC


Division du système et des technologies de l'information et de la communication
Université de Genève | 66, Boulevard Carl-Vogt | 1205 Genève
Tél 022 379 77 37 | Bureau D605

www.unige.ch/stic <http://www.unige.ch/stic>
Comment 11 Yann 2023-04-18 08:39:13 MDT
Hi Benny,

this workaround should indeed work, thanks.

You can close the issue please.

Best

Yann
Comment 13 Benny Hedayati 2023-04-25 08:51:33 MDT
Hi,

Thanks for letting me know, I will leave the ticket open for now because we are trying to resolve the group=root bug through this ticket.  I'm glad you got it to work and once the original problem is fixed, we will inform you in this ticket.

Regards
Comment 39 Carlos Tripiana Montes 2023-06-06 13:34:25 MDT
Hi Yann,

Even though the workaround provided did the trick for you, we wanted to get this fixed in the long term.

We put the fix in 23.02 branch, so it will eventually become available when 23.02.3 gets released. Commits:

*   662011a49c (HEAD -> slurm-23.02, origin/slurm-23.02) Merge branch 'bug16371' into slurm-23.02
|\  
| * 1fffbd40a0 Add NEWS
| * 338f5035b7 slurmctld/groups - reverse order for UID array
| * 368b3895f9 slurmctld/groups - fix root ID (0) being ignored
| * 94c927373d slurmctld/groups - remove non-implemented declaration from header
|/  

We landed yet another commit, only for master branch (future 23.11), regarding this bug. This just enables root/SlurmUser to send jobs to any reservation, even it they're not allowed. Commit:

89dae4fdc0 (HEAD -> master, origin/master, origin/HEAD) Allow SlurmUser/root to use reservations without specific permissions

So, all in all, we're now closing this bug, but it will be as resolved/fixed, rather than infogiven.

Have a good day,
Carlos.
Comment 40 Yann 2023-06-08 08:18:39 MDT
Many thanks!