Ticket 15722

Summary: does sreport output shows cpu utilization of gpu nodes?
Product: Slurm Reporter: Tana Vinod <vinodkumar.tana>
Component: AccountingAssignee: Felip Moll <felip.moll>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: felip.moll, vinodkumar.tana
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: Cerence AI Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Tana Vinod 2023-01-03 03:29:58 MST
Hi Team,

we are using slurm version 20.02.

Here is the output of my sreport. Does the TRES Name-CPU includes cpu utilization of gpu nodes as well? 
Is there anyway we can get cpu utilization of gpu nodes separately?

#sreport -t percent -T ALL cluster utilization

Cluster Utilization 2023-01-02T00:00:00 - 2023-01-02T23:59:59
Usage reported in Percentage of Total
--------------------------------------------------------------------------------
  Cluster      TRES Name    Allocated     Down PLND Dow          Idle Reserved      Reported
--------- -------------- ------------ -------- -------- ------------- -------- -------------
     crg2            cpu       34.20%    0.00%    0.00%        60.47%    5.33%       100.00%
     crg2            mem       28.41%    0.00%    0.00%        71.59%    0.00%       100.00%
     crg2         energy        0.00%    0.00%    0.00%         0.00%    0.00%         0.00%
     crg2        billing       34.20%    0.00%    0.00%        65.80%    0.00%       100.00%
     crg2        fs/disk        0.00%    0.00%    0.00%         0.00%    0.00%         0.00%
     crg2           vmem        0.00%    0.00%    0.00%         0.00%    0.00%         0.00%
     crg2          pages        0.00%    0.00%    0.00%         0.00%    0.00%         0.00%
     crg2       gres/gpu       64.36%    0.00%    0.00%        35.64%    0.00%       100.00%
     crg2    gres/gpu:t4       51.89%    0.00%    0.00%        48.11%    0.00%       100.00%
     crg2 gres/gpu:volta       79.33%    0.00%    0.00%        20.67%    0.00%       100.00%
     crg2 gres/gpu:ampe+       73.20%    0.00%    0.00%        26.80%    0.00%       100.00%
Comment 1 Felip Moll 2023-01-03 05:01:10 MST
(In reply to Tana Vinod from comment #0)
> Hi Team,
> 
> we are using slurm version 20.02.
> 
> Here is the output of my sreport. Does the TRES Name-CPU includes cpu
> utilization of gpu nodes as well? 
> Is there anyway we can get cpu utilization of gpu nodes separately?
> 

Hi,

The TRES "cpu" includes utilization of cpu of any node that has been allocated, be it a "gpu" node or not.
The gpu cores and gpu memory (not the physical CPU cores associated with gres.conf) are not accounted in Slurm.
In Slurm 23.05 there's a new plugin to account for gpu usage on AMD gpus, visible in commit 65e5c787ab.

Does this respond to your question?
Comment 2 Tana Vinod 2023-01-03 05:05:34 MST
Thanks Felip, for your prompt response.
Comment 3 Felip Moll 2023-01-03 05:27:31 MST
You're welcome, don't hesitate to raise new bugs if you have more questions :)