| Summary: | Can we limit user jobs per node | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | GSK-ONYX-SLURM <slurm-support> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | GSK | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | ? | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
GSK-ONYX-SLURM
2020-02-21 03:50:40 MST
Mark, I'm not sure if I fully understand your requirements, so let me rephrase it to make sure we're on the same page. You're trying to limit the number of applications running on a specific computing node per user. In terms of resources, you don't want to use memory as the one that will be consumed. Even on older versions, you can define a generic resource[1] and use it as the one being consumed. This will require the user to specify that the application requires this resource (for instance --gres=javaApp) and will effectively limit number of concurrent jobs with this requirement running per host. Another approach will be to define a QoS with specific MaxTRESPerNode (or MaxCPUsPerNode on older Slurm versions). This way you can limit the number of resources(CPUs or GRES in this case) used by the user per node. Let me know if this was helpful. cheers, Marcin [1] https://slurm.schedmd.com/gres.html Hi Marcin. Sorry, I probably didn't describe my query very well. Basically a user wants to self limit himself to only 24 jobs of a particular java type script on specific nodes, irrespective of what anyone else is doing, and even if the node could support many more jobs, eg 192 core server. What you've described using GRES is I believe exactly what I want. In slurm.conf I just need a GRES=java:24 on the nodename line. I don't even need an entry in gres.conf because this is just a simple name:count pair. The user then sbatches with a --gres:java on all his jobs and he'll never have more than 24 on the node. Please go ahead and close this call. You've pointed me in the right direction. Many thanks. Mark. |