AcctGatherEnergy RAPL plugin is using the same energy unit for all CPU and DRAM packages: https://github.com/SchedMD/slurm/blob/master/src/plugins/acct_gather_energy/rapl/acct_gather_energy_rapl.c#L326 However, on many modern server architectures (Haswell, Skylake X/SP, CascadeLake SP), DRAM energy unit is distinct from the package energy unit stored in the MSR_RAPL_POWER_UNIT register. Instead, it has a fixed value of 1/15300. The (gloomy) situation becomes clear when looking at the Linux powercap driver code, which gives correct measurements: https://github.com/torvalds/linux/blob/master/drivers/powercap/intel_rapl_common.c#L964 https://github.com/torvalds/linux/blob/master/drivers/powercap/intel_rapl_common.c#L1017 So apparently, the only viable solution would be to check CPU model and set DRAM energy unit accordingly. As a result of this bug, AcctGatherEnergy reports power and energy values which are incorrect, and in my experiments they were usually inflated by as much as 30%-50%.
Created attachment 16196 [details] proposed patch This patch fixes multiple bugs/issues in power computation: - CurrentWatts: using CPU energy unit for DRAM domain resulted in wrong values on many systems (Intel Haswell/Skylake/CascadeLake) - CurrentWatts: same energy unit was used for all packages -> might work for now, but could break anytime - AveWatts: incorrect value due to missing normalization by the polling interval - AveWatts: inaccurate value due to using integer type to compute running average (at some point contribution of the current measurement becomes <1.0 -> AveWatts is frozen)
Hello, Any reason why this issue never got attention. The bug exists still in the RAPL plugin due to which the energy consumption reported by SLURM is significantly over-estimated than the actual values. Here is a little [report](https://gist.github.com/mahendrapaipuri/bcd357747d32073e3cb4622940db408b) on the bug.
Hello Mahendra, I am looking at how to best integrate this patch to current slurm version, your report is being very useful, many thanks
Created attachment 39397 [details] Slurm patch fox fixing the energy gathering in the DRAM modules Hello Mahendra, We have taken Alexey patch and adapted to the current slurm codebase, seeing that you have already tested the rapl-read in some of the affected CPUs, would you be so kind to test this patch too? Please test it on a reduced set of nodes if possible. It should not happen but if a segfault occurs we don't want to impact production. Many thanks in advance.
Hello Oriol, Sorry for the late response, I have been caught up with a lot of stuff leading to SC24. Unfortunately I am not the one that manages SLURM cluster on our center and I cannot really test it on our prod machines. I will see what I can do with my sysadmin team. I have also access to hardware where I will be able to quickly spin up SLURM cluster with the patch and see if it has been fixed. Thanks for the patch. Regards Mahendra