Following the discussion of the pull request #113, the goal of this bug request is to have the cpu time and cpu utilization fields to be float fields so the usage can be more accurate. the problem is that with high rate profiling, the cpu usage becomes a binary field (see the output below). I would like to have a more accurate value of the cpu usage. I'm thinking about writing another profiling plugin that may be very useful in our environment, and probably also in other environments where compute nodes are shared. As you can see in the output below, the cpu utilization is 100 or 0, and the cpu time is 1 or 0. Converting the cputime field to a float/double value would allow for both counters to be more accurate and provide better information to the users. Regards, Carlos $ cat extract_10948.csv Job,Step,Node,Series,Date Time,ElapsedTime,CPU Frequency,CPU Time,CPU Utilization,rss,VM Size,Pages,Read_bytes,Write_bytes 10948,0,compute1,Task_0,2015-06-03 17:28:23,0,2400000,0,0.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:24,1,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:25,2,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:26,3,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:27,4,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:28,5,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:29,6,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:30,7,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:31,8,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:32,9,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:33,10,2400000,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:34,11,2399999,0,0.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:35,12,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:36,13,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:37,14,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:38,15,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:39,16,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:40,17,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:41,18,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:42,19,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:43,20,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:44,21,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:45,22,2399999,1,100.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:46,23,2399999,0,0.000,1276,9340,0,0.000,0.000 10948,0,compute1,Task_0,2015-06-03 17:28:47,24,2399999,1,100.000,1276,9340,0,0.000,0.000
Enhancement request to use double type to represent cpu time instead of uint32_t. David
Any news on this petition? Should I implement it?
If you would like to go for it please do, I don't think we will have time before 15.08 to look at it.
Created attachment 2112 [details] tot_cpu from int to double patch I've patched the code to change tot_cpu field from int to double. I've compiled with --enable-developer flag and everything seems to work fine. Please have a look at it.
Carlos, it appears this patch is a reverse patch, but I was able to figure it out ;). Any case it is committed in 78c0bf9a58036. Thanks!
Important fix to original code here, prevents divide by zero: https://github.com/SchedMD/slurm/commit/94b11ac40bc569002b7376150883784ee57b2423