Ticket 1258 - Enhance energy, power and time data in slurm.h
Summary: Enhance energy, power and time data in slurm.h
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 15.08.x
Hardware: All All
: 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2014-11-12 22:29 MST by Robert Schoene
Modified: 2017-03-15 13:44 MDT (History)
3 users (show)

See Also:
Site: Universitat Dresden (Germany)
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Change type of energy variables from uint32 to uint64 (25.55 KB, patch)
2015-06-18 23:40 MDT, Thomas Cadeau
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Robert Schoene 2014-11-12 22:29:24 MST
This feature request concerns the insufficient data types used in struct acct_gather_energy and struct ext_sensors_data.

We have currently certain issues with the structs, as they are defined in 14.11 s slurm.h. 

Issues:
1) integral data types restrict the precision of measurements unnecessarily (currently 1W / 1J granularity)

2) one second time granularity restricts measurement accuracy, e.g. correlation of profile samples to application progress
E.g., if the power is read every 3 seconds with a certain delay (e.g. 330 ms), then the timestamps for the power reading would be
1,
4, (+3)
7, (+3)
11, (+4)
14, (+3)
17, (+3)
21, (+4)
...
Thus, every third power reading will have a higher priority in calulating the total energy of the job. The other measurements will have a lower priority.
This scenario is not theoretical but could be observed at a system in Dresden.

3) consumed_energy might overflow in realistic job scenarios
E.g., 1,000 nodes with 500 W each add 500kJ to the job energy per second. After approx 8600 seconds (~2.5 hrs), the 4.3GJ that can be stored in an uint32_t will overflow in slurmctld's call to jobacctinfo_aggregate(...).

Suggested fix:
- Replace uint32_t with double (for energy, power and temperature)
- Replace time_t with alternative:
a) uint64_t unix timestamp in nanoseconds
b) struct timespec (used by clock_gettime)
Depends on what is more convenient for slurm internal usage. Usually a) should be easier to handle.

Affected definitions:

typedef struct acct_gather_energy {
  uint32_t base_consumed_energy;
  uint32_t base_watts; /* lowest power consump of node, in watts */
  uint32_t consumed_energy; /* total energy consumed by node, in joules */
  uint32_t current_watts; /* current power consump of node, in watts */
  uint32_t previous_consumed_energy;
  time_t poll_time; /* When information was last retrieved */
} acct_gather_energy_t;

typedef struct ext_sensors_data {
  uint32_t consumed_energy; /* total energy consumed, in joules */
  uint32_t temperature; /* temperature, in celsius */
  time_t energy_update_time; /* last update time for consumed_energy */
  uint32_t current_watts; /* current power consumption, in watts */
} ext_sensors_data_t;
Comment 1 Thomas Cadeau 2015-06-18 23:40:58 MDT
Created attachment 1980 [details]
Change type of energy variables from uint32 to uint64

Changes are made in:
  -slurm.h structures
  -acct_gather_energy plugins
  -ext_sensors plugins
  -pack/unpack functions with new version checks

There is very few changes in accounting parts since energy are already saved as uint64 or double.

I update also inside src/common/slurm_topology.h in strucure switch_record.
But I don't see code using this variable.
Comment 2 Danny Auble 2015-06-18 23:56:39 MDT
Thomas, does it makes sense to change the watts variables as well? 



On June 19, 2015 4:40:58 AM PDT, bugs@schedmd.com wrote:
>http://bugs.schedmd.com/show_bug.cgi?id=1258
>
>Thomas Cadeau <thomas.cadeau@bull.net> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>                 CC|                            |thomas.cadeau@bull.net
>
>--- Comment #1 from Thomas Cadeau <thomas.cadeau@bull.net> ---
>Created attachment 1980 [details]
>  --> http://bugs.schedmd.com/attachment.cgi?id=1980&action=edit
>Change type of energy variables from uint32 to uint64
>
>Changes are made in:
>  -slurm.h structures
>  -acct_gather_energy plugins
>  -ext_sensors plugins
>  -pack/unpack functions with new version checks
>
>There is very few changes in accounting parts since energy are already
>saved as
>uint64 or double.
>
>I update also inside src/common/slurm_topology.h in strucure
>switch_record.
>But I don't see code using this variable.
>
>-- 
>You are receiving this mail because:
>You are on the CC list for the bug.
>You are the assignee for the bug.
Comment 3 Thomas Cadeau 2015-06-19 01:24:31 MDT
I don't think so, there is no sum on watts for several nodes, and I don't think we will see nodes, or other equpments with more than 4.29 GJ.
Or maybe on a complete cluster if we want to sum all powers.
Comment 4 Danny Auble 2015-06-19 09:22:33 MDT
That makes since.  The only reason I bring it up is because that is the 
way it is stored in the HDF5 code.

On 06/19/15 06:24, bugs@schedmd.com wrote:
>
> *Comment # 3 <http://bugs.schedmd.com/show_bug.cgi?id=1258#c3> on bug 
> 1258 <http://bugs.schedmd.com/show_bug.cgi?id=1258> from Thomas Cadeau 
> <mailto:thomas.cadeau@bull.net> *
> I don't think so, there is no sum on watts for several nodes, and I don't think
> we will see nodes, or other equpments with more than 4.29 GJ.
> Or maybe on a complete cluster if we want to sum all powers.
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You are on the CC list for the bug.
>   * You are the assignee for the bug.
>
Comment 5 Danny Auble 2015-06-30 09:30:46 MDT
This has been added.  Is the time_t still an issue here?  If not I'll close the bug.
Comment 6 Danny Auble 2015-07-07 02:06:55 MDT
Thomas, Robert, any input?
Comment 7 Robert Schoene 2015-07-27 00:17:19 MDT
Sorry for the late reply, I've been at paternity leave.

As I see it, the initial problem is still there. There is no way to represents time stamps with a finer granularity than 1 second, which means that bug #2 (initial post) is still there. There is also no way to represent a more fine grained power or energy measurement. The only bug that is tackled is the overflow.
Comment 8 Brian Christiansen 2016-04-11 08:53:33 MDT
I have a couple of questions regarding the request for finer grained timestamps.

1. What does having finer grained timestamps buy? The total energy will still be the same. And the hd5 profile will just show a little more energy for that time interval.

2. The hdf5 profile timestamps are in seconds. How would you expect this to be presented if it was in ms?

Thanks,
Brian
Comment 9 Robert Schoene 2016-04-13 18:56:30 MDT
If I remember correcty, the energy value in SLURM acct_gather_energy.ipmi is calculated as power times time. As I described in issue (2) a 1-second granularity leads to a lower precision on energy values. I have no problem to store the power information at a per second scale in HDF. It's just about the energy values.

Regarding question 1:

The energy will only be the same for workflows which provide a constant power consumption over time. When the power consumption varies, the result is unpredictable.

Regarding question 2:
It's ok for me that SLURM-written hdf files use seconds. Thats the external representation. I only want the INTERNAL resolution to be higher.
Still, I would like to have a higher resolution in the external representation as well, but that might cause some issues with tools depending on the current data format, so I understand if that's not feasible.