Ticket 8156

Summary: Setup so sreport will provide GPU TRES
Product: Slurm Reporter: hpc-admin
Component: GPUAssignee: Director of Support <support>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 19.05.4   
Hardware: Linux   
OS: Linux   
Site: Ghent Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm.conf

Description hpc-admin 2019-11-27 07:23:29 MST
Hi,

We're running 19.05.4, and since GPUs are now a TRES, I was wondering how we can get this resource reported on when running sreport. Our pilot setup for a cluster with GPU nodes shows:

[root@master31 ~]# sreport --cluster=joltik -T ALL cluster Utilization start=01/09/19 -t Hours
--------------------------------------------------------------------------------
Cluster Utilization 2019-01-09T00:00:00 - 2019-11-26T23:59:59
Usage reported in TRES Hours
--------------------------------------------------------------------------------
  Cluster      TRES Name     Allocated         Down PLND Dow           Idle Reserved       Reported
--------- -------------- ------------- ------------ -------- -------------- -------- --------------
   joltik            cpu        121187        17513        0         431881      498         571079
   joltik            mem     870805557    210830842        0     5793454745        0     6875091145
   joltik         energy             0            0        0              0        0              0
   joltik        billing        122753        28458        0         776791        0         928003
   joltik        fs/disk             0            0        0              0        0              0
   joltik           vmem             0            0        0              0        0              0
   joltik          pages             0            0        0              0        0              0

So there is no GPU whatsoever. The documentation seems to suggest I need to set AccountingStorageTRES, but since GPU is no longer a GRES, but  a TRES, I am slightly confused. Do I still need to set 

AccountingStorageTRES=gres/gpu?

If so, can this be made clearer that this is also required for any new TRES?

We have set these:

SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory

But this has to do with  scheduling, if  I understand correctly. It is unclear if this also means that the new TRES will be used for accounting.

Kind regards,
-- Andy
Comment 1 Douglas Wightman 2019-11-27 08:19:22 MST
Could you attach your most recent slurm.conf and gres.conf (if you have one) to this ticket? Thank you.
Comment 2 hpc-admin 2019-11-27 09:28:27 MST
Created attachment 12419 [details]
slurm.conf
Comment 3 hpc-admin 2019-11-27 09:29:09 MST
Hi Douglas,

We do not have a gres.conf at this point. slurm.conf was attached.

Kind regards,
-- Andy
Comment 4 Michael Hinton 2019-11-27 09:47:39 MST
(In reply to hpc-admin from comment #0)
> So there is no GPU whatsoever. The documentation seems to suggest I need to
> set AccountingStorageTRES,
Correct.

> but since GPU is no longer a GRES, but  a TRES, I
> am slightly confused.
That's not correct. GPUs are still GRES. They are also tracked as TRES.

> Do I still need to set 
> AccountingStorageTRES=gres/gpu?
Yes. And add gres/gpu:<type> for each specific GPU type as well to get full accounting 'coverage' for the GPUs.

> We have set these:
> 
> SelectType=select/cons_tres
> SelectTypeParameters=CR_Core_Memory
> 
> But this has to do with scheduling, if I understand correctly. It is
> unclear if this also means that the new TRES will be used for accounting.
The accounting part for GPUs is specified only by AccountingStorageTRES. Note that there are some defaults that can't be turned off, which is why they see a bunch of accounting for things they didn't specify.
Comment 5 Douglas Wightman 2019-12-03 08:31:07 MST
Closing with answers provided to questions.