Ticket 24227

Summary: Doubts Regarding Definition of Metrics
Product: Slurm Reporter: Manuel Giménez de Castro Marciani <manuel.gimenez>
Component: AccountingAssignee: Jacob Jenson <jacob>
Status: RESOLVED MOVED QA Contact:
Severity: 6 - No support contract    
Priority: --- CC: armengod
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Manuel Giménez de Castro Marciani 2025-12-01 03:17:04 MST
I am using Slurm's (version 23.02.7) collected metrics (jobacct_gather/linux) to do a performance analysis of an application. 

I have thoroughly read the documentation regarding the metrics (https://slurm.schedmd.com/sacct.html) but still find the Ave* metrics confusing, and more specifically the AveRSS and AveDiskWrite.

AveDiskWrite is defined as "Average number of bytes written by all tasks in job." So if I double the workload, which had x avediskwrite, I should observe 2x. So far it is what I observed. But, then, if I double the resources while maintaining the workload I observe again x, and not 2x. 

So my suspicion is that the metric is the sum of written bytes across time, then divided by the number of nodes. 

But then with AveRSS, defined as "Average resident set size of all tasks in job," I observe what I expected with AveDiskWrite. That is, that this metric scales with the workload irrespective of the resources it has available. 

I would be thankful if you could clarify the behavior, and even more grateful if you could point me where in the code these metrics are aggregated and processed to be stored in the database.

Thanks!
Comment 3 Manuel Giménez de Castro Marciani 2025-12-03 03:30:33 MST
Support from my HPC center contacted me saying that I should talk with them.