Created attachment 2357 [details] energy2.patch Hello, I'm trying to get collection of power data working for accounting purposes. I've configured: dmj@cori01:~> scontrol show config | grep -i acct AcctGatherEnergyType = acct_gather_energy/cray AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType = acct_gather_infiniband/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none JobAcctGatherFrequency = 30 JobAcctGatherType = jobacct_gather/linux JobAcctGatherParams = (null) The errors I'm getting: + srun sleep 1 slurmstepd: acct_gather_energy_p_get_data: unknown enum 7 slurmstepd: acct_gather_energy_p_get_data: unknown enum 7 slurmstepd: acct_gather_energy_p_get_data: unknown enum 7 There are similar errors in the slurmd logs. I couldn't find much in the way of documentation for this plugin, so I appreciate any advice you can give. Thanks, Doug
Created attachment 2354 [details] partch for Cray energy Doug, does this patch fix the issue? Looks like it was missed from a change to the enum. If this fixes the issue I'll check it in. I haven't been able to test it on a real Cray just yet, if you can that would be great ;).
Hi Danny, The patch doesn't include a definition for sensor_cnt and thus doesn't compile: given: acct_gather_energy_t *energy = (acct_gather_energy_t *)data; time_t *last_poll = (time_t *)data; and: *last_poll = local_energy->poll_time; ... *sensor_cnt = 1; I assume that sensor_cnt should be some cast of data, what type? Thanks, Doug
put this on top of the other, sorry for missing it. On 10/29/15 11:24, bugs@schedmd.com wrote: > > *Comment # 2 <http://bugs.schedmd.com/show_bug.cgi?id=2084#c2> on bug > 2084 <http://bugs.schedmd.com/show_bug.cgi?id=2084> from Doug Jacobsen > <mailto:dmjacobsen@lbl.gov> * > Hi Danny, > > The patch doesn't include a definition for sensor_cnt and thus doesn't compile: > > given: > > acct_gather_energy_t *energy = (acct_gather_energy_t *)data; > time_t *last_poll = (time_t *)data; > > > and: > > *last_poll = local_energy->poll_time; > ... > *sensor_cnt = 1; > > I assume that sensor_cnt should be some cast of data, what type? > > Thanks, > Doug > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. > * You are the assignee for the bug. >
Created attachment 2358 [details] patch working on cori
Thanks for sending it out, I ended coming to the same solution while the messages were in flight. We're collecting data now. What are the units for CollectedEnergy in the sacct? Are all the steps orthogonal in terms of usage? e.g., a trival job: dmj@cori03:~> sacct -j 6990 --format=user,job,ConsumedEnergy,ConsumedEnergyRaw User JobID ConsumedEnergy ConsumedEnergyRaw --------- ------------ -------------- ----------------- dmj 6990 6990.0 801 801.000000 6990.1 11 11.000000 dmj@cori03:~> Thanks, Doug
I believe joules. On October 29, 2015 12:16:03 PM PDT, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=2084 > >--- Comment #5 from Doug Jacobsen <dmjacobsen@lbl.gov> --- >Thanks for sending it out, I ended coming to the same solution while >the >messages were in flight. We're collecting data now. What are the >units for >CollectedEnergy in the sacct? > >Are all the steps orthogonal in terms of usage? e.g., a trival job: > >dmj@cori03:~> sacct -j 6990 >--format=user,job,ConsumedEnergy,ConsumedEnergyRaw > User JobID ConsumedEnergy ConsumedEnergyRaw >--------- ------------ -------------- ----------------- > dmj 6990 > 6990.0 801 801.000000 > 6990.1 11 11.000000 >dmj@cori03:~> > >Thanks, >Doug > >-- >You are receiving this mail because: >You are on the CC list for the bug. >You are the assignee for the bug.
Now that this is working, I have a followup question, if we aren't using the hdf5 profiling, is there benefit to setting AcctGatherFilesystemType to the lustre plugin? Will the data be gathered for total job read/write/size? It's unclear from the documentation. Thank you, Doug
Currently it only matters with profiling. The same holds true for the acct_gather_infiniband as well (just incase you wanted to ask that question as well ;)). Pretty much what is stored in the struct jobacctinfo defined in src/common/slurm_jobacct_gather.h is stored in the database. If it isn't there it isn't stored. Let me know if you have any other questions. The patch is now in 15.08 commit fe9cc7426c0cb2e. Do you have anything else on this one?
I think this is great -- thanks again. -Doug