Ticket 4165

Summary: sreport cluster UserUtilizationByaccount versus sreport job SizesByAccount : inconsistencies
Product: Slurm Reporter: mail
Component: User CommandsAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 16.05.2   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description mail 2017-09-14 15:25:02 MDT
Hello,

I am currently trying to monitor the CPU usage of users of a cluster running slurm. I found 3 ways to do this but I got inconsistencies.


If I use 'sreport cluster UserUtilizationByAccount' (see actual command below), the column 'Used' give me some numbers (even for account no more associated to my user: 'grpdel').

If I use 'sreport job SizesByAccount' (see actual command below):
  - the account 'grpdel' does not appear this time
  - I obtain the same value for grp001
  - I obtain a smaller value for grp002

So the same binary does not seem to report the same values depending on how we ask. Am I missing something here ?

I then check those values with sacct (see actual commands below) and obtain the same numbers as with 'sreport job SizesByAccount'.



My question is : which values are correct, meaning which values are checked against the limits imposed by QOS (or partitions or account) ?


Thanks for your time,
Cyril



 $ sreport -t Seconds cluster UserUtilizationByAccount Users=username start=2001-01-01 end=2100-01-01
--------------------------------------------------------------------------------
Cluster/User/Account Utilization 2001-01-01T00:00:00 - 2017-09-14T22:59:59 (527119200 secs)
Use reported in TRES Seconds
--------------------------------------------------------------------------------
  Cluster     Login     Proper Name         Account       Used   Energy 
--------- --------- --------------- --------------- ---------- -------- 
  cluster  username   XXXXX XXXXXXX         grpdel 4798042480        0 
  cluster  username   XXXXX XXXXXXX         grp001  183102536        0 
  cluster  username   XXXXX XXXXXXX         grp002    6134353        0 





 $ sreport -t Seconds job SizesByAccount Users=username start=2001-01-01 end=2100-01-01 grouping=1000
--------------------------------------------------------------------------------
Job Sizes 2001-01-01T00:00:00 - 2017-09-14T22:59:59 (527119200 secs)
Time reported in Seconds
--------------------------------------------------------------------------------
  Cluster   Account    0-999 CPUs  >= 1000 CPUs % of cluster 
--------- --------- ------------- ------------- ------------ 
  cluster    grp002       5406909             0        2.87% 
  cluster    grp001     183102536             0       97.13% 




 $ sacct -X -S 2001-01-01 -E 2100-01-01 -A grp001 --noheader -o CPUTimeRaw | awk '{sum+=$1} END {print sum}'
183102536



 $ sacct -X -S 2001-01-01 -E 2100-01-01 -A grp002 --noheader -o CPUTimeRaw | awk '{sum+=$1} END {print sum}'
5406909