This feature request concerns the insufficient data types used in struct acct_gather_energy and struct ext_sensors_data. We have currently certain issues with the structs, as they are defined in 14.11 s slurm.h. Issues: 1) integral data types restrict the precision of measurements unnecessarily (currently 1W / 1J granularity) 2) one second time granularity restricts measurement accuracy, e.g. correlation of profile samples to application progress E.g., if the power is read every 3 seconds with a certain delay (e.g. 330 ms), then the timestamps for the power reading would be 1, 4, (+3) 7, (+3) 11, (+4) 14, (+3) 17, (+3) 21, (+4) ... Thus, every third power reading will have a higher priority in calulating the total energy of the job. The other measurements will have a lower priority. This scenario is not theoretical but could be observed at a system in Dresden. 3) consumed_energy might overflow in realistic job scenarios E.g., 1,000 nodes with 500 W each add 500kJ to the job energy per second. After approx 8600 seconds (~2.5 hrs), the 4.3GJ that can be stored in an uint32_t will overflow in slurmctld's call to jobacctinfo_aggregate(...). Suggested fix: - Replace uint32_t with double (for energy, power and temperature) - Replace time_t with alternative: a) uint64_t unix timestamp in nanoseconds b) struct timespec (used by clock_gettime) Depends on what is more convenient for slurm internal usage. Usually a) should be easier to handle. Affected definitions: typedef struct acct_gather_energy { uint32_t base_consumed_energy; uint32_t base_watts; /* lowest power consump of node, in watts */ uint32_t consumed_energy; /* total energy consumed by node, in joules */ uint32_t current_watts; /* current power consump of node, in watts */ uint32_t previous_consumed_energy; time_t poll_time; /* When information was last retrieved */ } acct_gather_energy_t; typedef struct ext_sensors_data { uint32_t consumed_energy; /* total energy consumed, in joules */ uint32_t temperature; /* temperature, in celsius */ time_t energy_update_time; /* last update time for consumed_energy */ uint32_t current_watts; /* current power consumption, in watts */ } ext_sensors_data_t;
Created attachment 1980 [details] Change type of energy variables from uint32 to uint64 Changes are made in: -slurm.h structures -acct_gather_energy plugins -ext_sensors plugins -pack/unpack functions with new version checks There is very few changes in accounting parts since energy are already saved as uint64 or double. I update also inside src/common/slurm_topology.h in strucure switch_record. But I don't see code using this variable.
Thomas, does it makes sense to change the watts variables as well? On June 19, 2015 4:40:58 AM PDT, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=1258 > >Thomas Cadeau <thomas.cadeau@bull.net> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > CC| |thomas.cadeau@bull.net > >--- Comment #1 from Thomas Cadeau <thomas.cadeau@bull.net> --- >Created attachment 1980 [details] > --> http://bugs.schedmd.com/attachment.cgi?id=1980&action=edit >Change type of energy variables from uint32 to uint64 > >Changes are made in: > -slurm.h structures > -acct_gather_energy plugins > -ext_sensors plugins > -pack/unpack functions with new version checks > >There is very few changes in accounting parts since energy are already >saved as >uint64 or double. > >I update also inside src/common/slurm_topology.h in strucure >switch_record. >But I don't see code using this variable. > >-- >You are receiving this mail because: >You are on the CC list for the bug. >You are the assignee for the bug.
I don't think so, there is no sum on watts for several nodes, and I don't think we will see nodes, or other equpments with more than 4.29 GJ. Or maybe on a complete cluster if we want to sum all powers.
That makes since. The only reason I bring it up is because that is the way it is stored in the HDF5 code. On 06/19/15 06:24, bugs@schedmd.com wrote: > > *Comment # 3 <http://bugs.schedmd.com/show_bug.cgi?id=1258#c3> on bug > 1258 <http://bugs.schedmd.com/show_bug.cgi?id=1258> from Thomas Cadeau > <mailto:thomas.cadeau@bull.net> * > I don't think so, there is no sum on watts for several nodes, and I don't think > we will see nodes, or other equpments with more than 4.29 GJ. > Or maybe on a complete cluster if we want to sum all powers. > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. > * You are the assignee for the bug. >
This has been added. Is the time_t still an issue here? If not I'll close the bug.
Thomas, Robert, any input?
Sorry for the late reply, I've been at paternity leave. As I see it, the initial problem is still there. There is no way to represents time stamps with a finer granularity than 1 second, which means that bug #2 (initial post) is still there. There is also no way to represent a more fine grained power or energy measurement. The only bug that is tackled is the overflow.
I have a couple of questions regarding the request for finer grained timestamps. 1. What does having finer grained timestamps buy? The total energy will still be the same. And the hd5 profile will just show a little more energy for that time interval. 2. The hdf5 profile timestamps are in seconds. How would you expect this to be presented if it was in ms? Thanks, Brian
If I remember correcty, the energy value in SLURM acct_gather_energy.ipmi is calculated as power times time. As I described in issue (2) a 1-second granularity leads to a lower precision on energy values. I have no problem to store the power information at a per second scale in HDF. It's just about the energy values. Regarding question 1: The energy will only be the same for workflows which provide a constant power consumption over time. When the power consumption varies, the result is unpredictable. Regarding question 2: It's ok for me that SLURM-written hdf files use seconds. Thats the external representation. I only want the INTERNAL resolution to be higher. Still, I would like to have a higher resolution in the external representation as well, but that might cause some issues with tools depending on the current data format, so I understand if that's not feasible.