Ticket 9629 - Add DCMI support to acct_gather_energy/ipmi
Summary: Add DCMI support to acct_gather_energy/ipmi
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 21.08.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Felip Moll
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2020-08-20 10:44 MDT by Timo R.
Modified: 2023-03-21 10:45 MDT (History)
3 users (show)

See Also:
Site: -Other-
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.02.0-0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
acct_gather_energy_ipmi_add_dcmi_support.patch (9.88 KB, application/mbox)
2020-08-20 10:44 MDT, Timo R.
Details
0001-IPMI-Implement-support-for-power-reading-via-DCMI.patch (8.89 KB, patch)
2022-05-06 06:14 MDT, Timo R.
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Timo R. 2020-08-20 10:44:18 MDT
Created attachment 15529 [details]
acct_gather_energy_ipmi_add_dcmi_support.patch

We recently got a new system with Zen2 CPUs, and rapl does not work on those.
The IPMI of those systems does report power consumption, however it does not expose it as a sensor. You need to query it via DCMI to get the current power consumption.

The attached patch adds support for specifying the special IPMI Sensor ID "DCMI", which will cause it to take a special code-path, bypassing normal sensor querying, instead getting the current power consumption from/via DCMI.

This patch is written against current master, however it also applies cleanly against 20.02 and I have deployed it on our systems. So far it works as expected.

You can also find this patch on my Github-Fork: https://github.com/TimoRoth/slurm/tree/ipmi_dcmi
Comment 1 Michael DiDomenico 2020-12-11 07:22:24 MST
I tested this patch against 20.2.5.1 on my cluster.  seems to work.  Thanks!
Comment 2 Timo R. 2022-05-06 06:14:00 MDT
Created attachment 24881 [details]
0001-IPMI-Implement-support-for-power-reading-via-DCMI.patch

Updated version of this patch that applies against latest SLURM versions.
Been using it for years now on our cluster without any issues.
Comment 15 Felip Moll 2023-01-16 03:40:05 MST
Hi Timo!,

It's been a while since you opened this bug, but finally and after some modifications and a review process, we added commits 040b903c9d..7eddee7877 to master. This means your contribution will be available in the next release of Slurm 23.02.

You will be able to set IPMIPowerSensors=Node=DCMI in acct_gather.conf to enable this sensor. It will be documented in the man page of acct_gather.conf.

If you have the chance, feel free to test it in your environment and report anything you find.

Thanks so much for your time and contribution.