Ticket 4165 - sreport cluster UserUtilizationByaccount versus sreport job SizesByAccount : inconsistencies
Summary: sreport cluster UserUtilizationByaccount versus sreport job SizesByAccount : ...
Status: RESOLVED INVALID
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 16.05.2
Hardware: Linux Linux
: 6 - No support contract
Assignee: Jacob Jenson
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-09-14 15:25 MDT by mail
Modified: 2017-09-14 15:25 MDT (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description mail 2017-09-14 15:25:02 MDT
Hello,

I am currently trying to monitor the CPU usage of users of a cluster running slurm. I found 3 ways to do this but I got inconsistencies.


If I use 'sreport cluster UserUtilizationByAccount' (see actual command below), the column 'Used' give me some numbers (even for account no more associated to my user: 'grpdel').

If I use 'sreport job SizesByAccount' (see actual command below):
  - the account 'grpdel' does not appear this time
  - I obtain the same value for grp001
  - I obtain a smaller value for grp002

So the same binary does not seem to report the same values depending on how we ask. Am I missing something here ?

I then check those values with sacct (see actual commands below) and obtain the same numbers as with 'sreport job SizesByAccount'.



My question is : which values are correct, meaning which values are checked against the limits imposed by QOS (or partitions or account) ?


Thanks for your time,
Cyril



 $ sreport -t Seconds cluster UserUtilizationByAccount Users=username start=2001-01-01 end=2100-01-01
--------------------------------------------------------------------------------
Cluster/User/Account Utilization 2001-01-01T00:00:00 - 2017-09-14T22:59:59 (527119200 secs)
Use reported in TRES Seconds
--------------------------------------------------------------------------------
  Cluster     Login     Proper Name         Account       Used   Energy 
--------- --------- --------------- --------------- ---------- -------- 
  cluster  username   XXXXX XXXXXXX         grpdel 4798042480        0 
  cluster  username   XXXXX XXXXXXX         grp001  183102536        0 
  cluster  username   XXXXX XXXXXXX         grp002    6134353        0 





 $ sreport -t Seconds job SizesByAccount Users=username start=2001-01-01 end=2100-01-01 grouping=1000
--------------------------------------------------------------------------------
Job Sizes 2001-01-01T00:00:00 - 2017-09-14T22:59:59 (527119200 secs)
Time reported in Seconds
--------------------------------------------------------------------------------
  Cluster   Account    0-999 CPUs  >= 1000 CPUs % of cluster 
--------- --------- ------------- ------------- ------------ 
  cluster    grp002       5406909             0        2.87% 
  cluster    grp001     183102536             0       97.13% 




 $ sacct -X -S 2001-01-01 -E 2100-01-01 -A grp001 --noheader -o CPUTimeRaw | awk '{sum+=$1} END {print sum}'
183102536



 $ sacct -X -S 2001-01-01 -E 2100-01-01 -A grp002 --noheader -o CPUTimeRaw | awk '{sum+=$1} END {print sum}'
5406909