Ticket 17156 - Users are able to modify each others' job array limits
Summary: Users are able to modify each others' job array limits
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 21.08.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marcin Stolarek
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2023-07-07 15:49 MDT by sysadmin
Modified: 2023-10-18 05:35 MDT (History)
1 user (show)

See Also:
Site: Allen Institute
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.4 23.11rc1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description sysadmin 2023-07-07 15:49:06 MDT
[amused.admin@slurm ~]# scontrol show job 10026927_8
JobId=10031043 ArrayJobId=10026927 ArrayTaskId=8 ArrayTaskThrottle=5 JobName=txSageRnaSeq
   UserId=victim(01134) GroupId=users(31770) MCS_label=N/A
[bored.user@slurm ~]$ scontrol update JobId=10026927_[3-180] ArrayTaskThrottle=6 
10026927_3-180: Invalid user id
[bored.user@slurm ~]$ logout
[amused.admin@slurm ~]# squeue | grep 10026927
  10026927_[9-180] celltypes txSageRn victim PD       0:00      1 (JobArrayTaskLimit)
        10026927_3 celltypes txSageRn victim  R      13:05      1 n294
        10026927_4 celltypes txSageRn victim  R      13:05      1 n294
        10026927_5 celltypes txSageRn victim  R      13:05      1 n294
        10026927_6 celltypes txSageRn victim  R       8:32      1 n293
        10026927_7 celltypes txSageRn victim  R       1:04      1 n293
        10026927_8 celltypes txSageRn victim  R       1:04      1 n291
[amused.admin@aidc-hpc-prd ~]# scontrol show job 10026927_8
JobId=10031043 ArrayJobId=10026927 ArrayTaskId=8 ArrayTaskThrottle=6 JobName=txSageRnaSeq
   UserId=victim(01134) GroupId=users(31770) MCS_label=N/A


In the logs, I also see this change:
[2023-07-07T14:12:41.210] _update_job: set max_run_tasks to 5 for job array JobId=10026927_*
[2023-07-07T14:12:41.226] _slurm_rpc_update_job: complete JobId=10026927_6-180 uid=20415 usec=16973
[2023-07-07T14:12:42.639] sched: Allocate JobId=10026927_6(10026933) NodeList=n293 #CPUs=32 Partition=celltypes
[2023-07-07T14:19:39.185] _job_complete: JobId=10026927_2(10026929) WEXITSTATUS 0
[2023-07-07T14:19:39.185] _job_complete: JobId=10026927_2(10026929) done
[2023-07-07T14:20:02.609] _update_job: set max_run_tasks to 6 for job array JobId=10026927_*

I claim that users should not be able to change the array limits (user could carve out more resources for themselves by limiting other jobs, or do a mass modify that would overwrite all job array limits on accident), let me know if this could just be a config that I missed
Comment 2 Marcin Stolarek 2023-07-10 05:41:35 MDT
I can reproduce the issue. I'm sending a patch to address that to review. I'll keep you posted on the progress.

cheers,
Marcin
Comment 19 Marcin Stolarek 2023-10-18 05:35:35 MDT
The bug got fixed by 442c8442d8, which got already released in Slurm 23.02.4, sorry for not letting you know earlier. We kept the bug open to work on additional improvements in this area (6b6c75cb5b) that landed in master branch - Slurm 23.11 to be.

I'm closing the bug as fixed now. Should you have any questions please reopen.

cheers,
Marcin