15341 – Managing CPU and GPU allocations within a single SLURM cluster

Ticket 15341 - Managing CPU and GPU allocations within a single SLURM cluster

Summary: Managing CPU and GPU allocations within a single SLURM cluster

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Accounting (show other tickets)
Version:	22.05.2
Hardware:	Linux Linux

Severity:	4 - Minor Issue
Assignee:	Ben Roberts
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2022-11-03 02:27 MDT by Maciej Cytowski
Modified:	2022-12-14 08:49 MST (History)
CC List:	0 users

See Also:
Site:	Pawsey
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Maciej Cytowski 2022-11-03 02:27:08 MDT

Hi,

allocations and access for projects running on our current CPU-only system are managed through SLURM accounts and shares. On the new heterogeneous system we would like to manage CPU and GPU allocations separately i.e. each project (account) will have a separate CPU and GPU allocation with separate billing. 

One obvious solution is to create separate slurm accounts for each project (project-cpu and project-gpu for each project), however we would like to avoid that.

Are there any examples or best practices on how this can be implemented? 

Kind regards,
Maciej

Comment 1 Ben Roberts 2022-11-04 10:37:40 MDT

Hi Maciej,

Creating unique accounts for CPU and GPU type jobs is the first thing that I would recommend for keeping the billing separate.  It would allow you to easily see the usage of the two types of jobs.  I can understand the desire to avoid creating two versions of each Account you currently have in place though.  

One option that may work for you would be to use Workload Characterization Keys (WCKeys) to identify the different types of jobs.  This would be an extra flag that is added to the different types of jobs so that you can easily identify CPU vs GPU jobs when they use the same Accounts.  

You can read more about WCKeys in the documentation here:
https://slurm.schedmd.com/wckey.html

It's also worth noting that sreport does have reports that take WCKeys into account:
https://slurm.schedmd.com/sreport.html#SECTION_REPORT-TYPES

Let me know if this sounds like something that would work for you.

Thanks,
Ben

Comment 2 Ben Roberts 2022-11-30 12:07:54 MST

Hi Maciej,

Did either of the proposed solutions sound like they would work for you?  Let me know if you still need help with this ticket or if it's ok to close.

Thanks,
Ben

Comment 3 Maciej Cytowski 2022-12-13 21:39:08 MST

Hi Ben, 

thank you for your help. This can be closed now.

Kind regards,
Maciej

Comment 4 Ben Roberts 2022-12-14 08:49:41 MST

I'm glad you found a solution that will work for you.  Let us know if there's anything we can do to help in the future.

Thanks,
Ben