Created attachment 15529 [details] acct_gather_energy_ipmi_add_dcmi_support.patch We recently got a new system with Zen2 CPUs, and rapl does not work on those. The IPMI of those systems does report power consumption, however it does not expose it as a sensor. You need to query it via DCMI to get the current power consumption. The attached patch adds support for specifying the special IPMI Sensor ID "DCMI", which will cause it to take a special code-path, bypassing normal sensor querying, instead getting the current power consumption from/via DCMI. This patch is written against current master, however it also applies cleanly against 20.02 and I have deployed it on our systems. So far it works as expected. You can also find this patch on my Github-Fork: https://github.com/TimoRoth/slurm/tree/ipmi_dcmi
I tested this patch against 20.2.5.1 on my cluster. seems to work. Thanks!
Created attachment 24881 [details] 0001-IPMI-Implement-support-for-power-reading-via-DCMI.patch Updated version of this patch that applies against latest SLURM versions. Been using it for years now on our cluster without any issues.
Hi Timo!, It's been a while since you opened this bug, but finally and after some modifications and a review process, we added commits 040b903c9d..7eddee7877 to master. This means your contribution will be available in the next release of Slurm 23.02. You will be able to set IPMIPowerSensors=Node=DCMI in acct_gather.conf to enable this sensor. It will be documented in the man page of acct_gather.conf. If you have the chance, feel free to test it in your environment and report anything you find. Thanks so much for your time and contribution.