Ticket 8553

Summary: Can we limit user jobs per node
Product: Slurm Reporter: GSK-ONYX-SLURM <slurm-support>
Component: SchedulingAssignee: Marcin Stolarek <cinek>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: cinek
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: GSK Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: ? Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description GSK-ONYX-SLURM 2020-02-21 03:50:40 MST
Hi.

A user has an issue to do with Java based jobs and memory limits.  To try to alleviate this they want to restrict the number of jobs they can run on any given server in the cluster.

So they want to be able to submit hundreds of jobs to the queue, but lets say, only 24 of their jobs would ever execute on any given node in the cluster.

I know I can restrict jobs down to the partition level but it doesn't look like I can restrict a given users jobs at the node level.

Thanks.
Mark.
Comment 1 Marcin Stolarek 2020-02-21 08:10:44 MST
Mark,

I'm not sure if I fully understand your requirements, so let me rephrase it to make sure we're on the same page.

You're trying to limit the number of applications running on a specific computing node per user. In terms of resources, you don't want to use memory as the one that will be consumed.

Even on older versions, you can define a generic resource[1] and use it as the one being consumed. This will require the user to specify that the application requires this resource (for instance --gres=javaApp) and will effectively limit number of concurrent jobs with this requirement running per host.

Another approach will be to define a QoS with specific MaxTRESPerNode (or MaxCPUsPerNode on older Slurm versions). This way you can limit the number of resources(CPUs or GRES in this case) used by the user per node.

Let me know if this was helpful.

cheers,
Marcin
[1] https://slurm.schedmd.com/gres.html
Comment 2 GSK-ONYX-SLURM 2020-02-21 10:49:24 MST
Hi Marcin.
Sorry, I probably didn't describe my query very well.
Basically a user wants to self limit himself to only 24 jobs of a particular java type script on specific nodes, irrespective of what anyone else is doing, and even if the node could support many more jobs, eg 192 core server.

What you've described using GRES is I believe exactly what I want.  In slurm.conf I just need a GRES=java:24 on the nodename line.  I don't even need an entry in gres.conf because this is just a simple name:count pair.

The user then sbatches with a --gres:java on all his jobs and he'll never have more than 24 on the node.

Please go ahead and close this call.  You've pointed me in the right direction.

Many thanks.
Mark.