| Summary: | Understanding sreport | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Jimmy Hui <jhui> |
| Component: | Accounting | Assignee: | Ben Roberts <ben> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | scott, tripiana |
| Version: | 21.08.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Roche/PHCIX | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hi Jimmy,
I think the most likely reason the numbers aren't lining up is because you're looking at the Elapsed Time in sacct rather than the amount of CPU Time. For example, if a job ran for 1 minute on 10 CPUs it would have an elapsed time of 1 minute, but it would have used 10 CPU minutes.
Here's a job I found from a few days ago on my system that demonstrates this. The job ran for 2 minutes and 4 seconds, but it was on 360 CPUs, so the CPUTime for the job is 12 hours and 24 minutes.
$ sacct -X -j 25843 -ojobid,start,end,elapsed,cputime
JobID Start End Elapsed CPUTime
------------ ------------------- ------------------- ---------- ----------
25843 2021-03-18T09:48:59 2021-03-18T09:51:03 00:02:04 12:24:00
Let me know if looking at the CPUTime in sacct doesn't make the numbers line up better with what you're seeing in sreport.
Thanks,
Ben
Jimmy, First of all, sacct will by default only show one job id. In the case where a job was requeued there may be more than one job with the same id. Using -D will show all duplicate jobs. Also, jobs don't just start and end at the beginning of the day and end of the day. sacct will give us the full account of all jobs that ran even partially in the time frame specified. To truncate the jobs to their portion in the time frame add -T. >sacct -DTX -u xxxx --starttime=2021-03-01 --endtime=2021-03-31 -o start,end,elapsed Now for calculation, the sreport output is in cpu minutes. For sacct, you will have to multiply the number of cpus by the raw elapsed time(seconds) and then convert to minutes to get a similar answer. So lets add AllocTRES into your sacct search and change elapsed to elapsedraw. >sacct -X -u xxxx --starttime=2021-03-01 --endtime=2021-03-31 -o start,end,AllocTRES,elapsedRaw To automatically calculate it we can add -P and some bash tools >sacct -DTXP -u xxxx --starttime=2021-03-01 --endtime=2021-03-31 -o start,end,AllocTRES,elapsedRaw | sed 's/.*cpu=//' | sed 's/,.*|/,/' | awk -F, 'NF==2 {sum+=$1 * $2} END {printf "%.2f\n", sum/60}' Let me know if this works for you, or if you have any questions. -Scott Hi Ben & Scott, Thanks for the clarification. Is there a way to generate a report base on Elapsed time with sreport? We are trying to calculate elapsed time usage base on the instance type. We have setup WCKey and each time a user request an instances we associated the instance type with the WCKey. Hi Jimmy, I'm afraid there isn't a way to have sreport just used the Elapsed Time rather than the total CPU Time for a report. If your nodes are all the same size and the jobs always exclusively get a node then you could derive that number by dividing the value from the report by the number of CPUs per node. Otherwise the best way to get that data is by looking at the job records with sacct and computing it from there. Can I ask if there's a reason you're wanting a report with just the elapsed time instead of the CPU time? Thanks, Ben Hi Ben, My colleague had submit this ticket https://bugs.schedmd.com/show_bug.cgi?id=10857 and was suggested that WCKey would work for us. We have a business need to track and charge back AWS usage. These partitions each map to a specific instance type and has a AWS cost associated with it. So we were trying to use sreport WCKeyUtilizationByUser to get that information. Hi Jimmy, Thanks for the explanation, that makes sense. It's possible that sreport could work still since these WCKeys are unique to the type of node. If the jobs consume all the resources on the nodes they run on then you can take number of CPU minutes for a certain WCKey and divide by the number of CPUs for that instance type, which should give you the uptime of the nodes. If your jobs don't have exclusive access to the nodes they run on then it will be a lot trickier to try and get the uptime of the nodes. So, do jobs typically use a full node when they cause it to be spun up? Or are you interested in enforcing a rule that jobs be charged for the full use of a node? It was me who suggested using WCKeys to track usage of different instance types, but I wasn't thinking about reporting on the uptime of the nodes. I apologize for that oversight, but if jobs do have exclusive access to the nodes they use then we should be able to get the information you want. Thanks, Ben Hi Ben, The numbers don't really add up here. Here is the sreport for this user. 1965495 CPU Minutes on the instance that has 64 vCPU. Cpu Min / core = total / minute = total * instance cost = total 1965495 / 64 = 30710.859375 / 60 = 511.847 * 24.48 = 12530.030625 When I check aws cost explorer this user use ~$23K Instance Size vCPUs On-Demand Price/hr* p3.16xlarge 64 $24.48 -------------------------------------------------------------------------------- Cluster/User/WCKey Utilization 2021-03-01T00:00:00 - 2021-03-24T18:59:59 (2055600 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster Login Proper Name WCKey Used --------- --------- --------------- --------------- --------- slurm-ma+ xxxx xxxx p3.16xlarge 1965495 Is the number of CPU Minutes coming from the sacct query you used before? If so can I have you run it again with the following columns in the output? sacct -X -u xxxx --starttime=2021-03-01 --endtime=2021-03-31 -ojobid,elapsed,cputime,alloccpus,allocnodes Or if it's a different set of jobs then please adjust the start and end times accordingly. I would like to see if all the CPUs on the nodes are being charged for these jobs. Thanks, Ben Hi Ben,
The CPU min is from sreport. Here is the report from sacct. I have added ElapsedRaw and CPUTimeRaw.
sacct -X -u petrovy1 --starttime=2021-03-01 --endtime=2021-03-31 -ojobid,elapsed,elapsedRaw,cputime,cputimeRAW,alloccpus,allocnodes
JobID Elapsed ElapsedRaw CPUTime CPUTimeRAW AllocCPUS AllocNodes
------------ ---------- ---------- ---------- ---------- ---------- ----------
3841 1-13:44:32 135872 100-15:30:08 8695808 64 1
4508 00:01:08 68 01:12:32 4352 64 1
4510 00:01:02 62 01:06:08 3968 64 1
4511 08:30:54 30654 22-16:57:36 1961856 64 1
4531 00:00:58 58 01:01:52 3712 64 1
4532 00:01:41 101 01:47:44 6464 64 1
4535 23:56:16 86176 63-20:01:04 5515264 64 1
4538 1-00:48:57 89337 66-04:12:48 5717568 64 1
4539 1-00:31:54 88314 65-10:01:36 5652096 64 1
4540 1-00:31:57 88317 65-10:04:48 5652288 64 1
4545 21:00:52 75652 56-00:55:28 4841728 64 1
4546 21:00:44 75644 56-00:46:56 4841216 64 1
4547 21:00:25 75625 56-00:26:40 4840000 64 1
4791 1-10:29:55 124195 91-23:54:40 7948480 64 1
4792 1-10:49:11 125351 92-20:27:44 8022464 64 1
4793 1-10:48:19 125299 92-19:32:16 8019136 64 1
6096 00:00:18 18 00:07:12 432 24 1
6097 00:06:44 404 02:41:36 9696 24 1
6098 00:04:17 257 01:42:48 6168 24 1
6099 00:01:13 73 00:29:12 1752 24 1
6100 00:01:14 74 00:29:36 1776 24 1
6101 00:01:13 73 00:29:12 1752 24 1
6102 00:03:43 223 01:29:12 5352 24 1
6103 00:03:56 236 01:34:24 5664 24 1
6104 00:01:56 116 00:46:24 2784 24 1
6106 00:01:24 84 00:33:36 2016 24 1
6107 00:00:42 42 00:16:48 1008 24 1
6108 00:19:57 1197 07:58:48 28728 24 1
6109 00:20:53 1253 08:21:12 30072 24 1
6110 01:51:20 6680 1-20:32:00 160320 24 1
6111 00:01:56 116 00:46:24 2784 24 1
6112 2-07:06:01 198361 55-02:24:24 4760664 24 1
6114 22:29:51 80991 22-11:56:24 1943784 24 1
6120 16:24:30 59070 43-18:08:00 3780480 64 1
6121 2-07:51:49 201109 55-20:43:36 4826616 24 1
10588 2-08:25:57 203157 56-10:22:48 4875768 24 1
6123 1-01:03:56 90236 25-01:34:24 2165664 24 1
6122 08:47:02 31622 8-18:48:48 758928 24 1
11316 1-00:43:51 89031 65-22:46:24 5697984 64 1
11983 1-01:25:17 91517 67-18:58:08 5857088 64 1
12712 18:58:14 68294 50-14:06:56 4370816 64 1
12728 06:48:04 24484 18-03:16:16 1566976 64 1
12733 1-19:41:46 157306 116-12:33:04 10067584 64 1
13097 00:01:12 72 01:16:48 4608 64 1
13098 00:43:53 2633 1-22:48:32 168512 64 1
13096 18:45:55 67555 50-00:58:40 4323520 64 1
13191 1-08:02:11 115331 85-10:19:44 7381184 64 1
14210 00:09:51 591 10:30:24 37824 64 1
14221 3-13:22:02 307322 227-15:30:08 19668608 64 1
16809 3-00:42:29 261749 193-21:18:56 16751936 64 1
14347 1-12:41:10 132070 97-19:54:40 8452480 64 1
19571 00:07:54 474 08:25:36 30336 64 1
19572 00:00:03 3 00:03:12 192 64 1
19618 00:04:03 243 04:19:12 15552 64 1
19620 00:03:05 185 03:17:20 11840 64 1
19637 00:00:00 0 00:00:00 0 24 0
19645 14:12:20 51140 37-21:09:20 3272960 64 1
19646 05:49:23 20963 15-12:40:32 1341632 64 1
Hi Jimmy, Thanks for sending that information. I can see that there are several jobs that were only billed for 24 CPUs rather than the full 64. If you use the CPUTimeRaw field to do this calculation it doesn't work out because the value doesn't take into account the fact that a full node is up and being charged even though the job doesn't use all the available CPUs. I calculated the full number of billable CPU Seconds by multiplying the ElapsedRaw value for each job by 64 and then summing those values. With that I got: 216773120 Billable CPU Seconds = 3612885.333 Billable CPU Minutes Then I plugged this into the equation you sent earlier and it comes out to ~$23k, as you said you expected. 3612885.333 / 64 = 56451.3333 / 60 = 940.8555 * 24.48 = 23032.14 There are a couple options you can take to make this behave as you are expecting. You can make a similar adjustment to your calculation, multiplying the ElapsedRaw value by the number of CPUs corresponding to the AWS instance type. Or you can force the jobs to get exclusive access to the nodes so that their calculated CPUTimeRaw value matches what is being charged by AWS. You can do that with the OverSubscribe parameter, setting it to EXCLUSIVE for the partitions. You can read more about that parameter here: https://slurm.schedmd.com/slurm.conf.html#OPT_OverSubscribe I set up an example of how this would look. I configured two partitions, 'debug' and 'high'. I configured the 'high' partition to use OverSubscribe=EXCLUSIVE. $ scontrol show partition | egrep 'PartitionName|OverSubscribe' PartitionName=debug PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO PartitionName=high PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE I submitted a short job to each of the partitions, just requesting a single processor on each of my 24 processor nodes. $ sbatch -n1 -pdebug --wrap='srun sleep 10' Submitted batch job 25902 $ sbatch -n1 -phigh --wrap='srun sleep 10' Submitted batch job 25903 In the report for these jobs you can see that the job that went to the 'debug' partition (25902) has a CPUTimeRaw of 10. The job that went to the 'high' partition (25903) was charged for the full node. $ sacct -j 25902,25903 -X -ojobid,elapsedRaw,cputimeRAW,alloccpus JobID ElapsedRaw CPUTimeRAW AllocCPUS ------------ ---------- ---------- ---------- 25902 10 10 1 25903 10 240 24 Hopefully this helps clarify where these numbers are coming from. Please let me know if you have any questions about this. Thanks, Ben Hi Ben,
Thanks for the information this seem to work. Should I be using -X or -DTPX? They return different numbers.
J
sacct -DTPX -u petrovy1 --starttime=2021-03-01 --endtime=2021-03-31 -o elapsedRaw |awk '{total = total + $1}END{print total}'
4417426
sacct -X -u petrovy1 --starttime=2021-03-01 --endtime=2021-03-31 -o elapsedRaw |awk '{total = total + $1}END{print total}'
3121358
The '-D' flag tells sacct to show duplicates. The '-T' flag tells sacct to truncate the time a job ran that might be before the start time or after the end time. The '-P' flag is to make the output parsable. The '-X' flag shows just the job allocation and doesn't include the steps. The truncate flag would only be able to reduce the time, but you're seeing an increase with these flags. The only flag that could result in more time would be the duplicate flag. A requeued job can cause there to be duplicate entries for jobs, but this doesn't sound likely with your workflow. Do you happen to have jobs that are being requeued? I would be interested in seeing the output of that query before it's processed by awk: sacct -DTPX -u petrovy1 --starttime=2021-03-01 --endtime=2021-03-31 -o jobid,elapsedRaw Thanks, Ben Hi Ben, Here is output. sacct -DTPX -u xxx --starttime=2021-03-01 --endtime=2021-03-31 -o jobid,elapsedRaw JobID|ElapsedRaw 3841|89411 4508|68 4510|62 4511|95454 4511|30654 4531|58 4532|101 4535|24364 4535|604 4535|11601 4535|22 4535|21 4535|23 4535|17 4535|86176 4538|89337 4539|23 4540|18 4539|23 4540|22 4540|24 4539|23 4539|18 4540|18 4539|88314 4540|88317 4545|75652 4546|75644 4547|75625 4791|124195 4792|125351 4793|125299 6096|18 6097|404 6098|257 6099|73 6100|74 6101|73 6102|223 6103|236 6104|116 6106|84 6107|42 6108|1197 6109|1253 6110|4296 6110|20 6110|604 6110|23 6110|13684 6110|23 6110|603 6110|6680 6111|116 6112|4982 6112|602 6112|22 6112|22 6112|21 6112|17 6112|198361 6114|22 6114|1254 6114|21 6114|603 6114|24 6114|21 6114|22 6114|18 6114|80991 6120|23 6120|24 6120|22 6120|18 6120|59070 6121|201109 6122|23 6122|90921 6123|421 6123|22 6123|30889 6122|25277 6123|29056 6122|24477 6123|30971 6122|603 6122|14276 6123|601 6123|21065 6122|11922 6122|3986 6122|5579 6123|5453 10588|22 10588|23 10588|23 6123|23 10588|17 6122|23 6123|22 10588|203157 6123|22 6122|21 6122|23 6123|17 6123|90236 6122|17 6122|31622 11316|21 11316|19218 11316|23 11316|603 11316|2639 11316|21 11316|22 11316|23 11316|18 11316|89031 11983|91517 12712|24 12712|6826 12712|604 12712|20 12712|22 12712|23 12712|18 12712|68294 12728|22 12728|23 12728|22 12728|17 12728|24484 12733|21 12733|22 12733|21 12733|17 12733|157306 13096|22 13097|17 13096|31856 13097|72 13098|23 13098|2633 13191|25853 13191|21 13191|21411 13096|19478 13096|21 13096|601 13096|6839 13191|602 13191|5219 13096|21 13191|22 13096|603 13191|603 13096|22 13191|22 13096|22 13191|22 13096|22 13191|22 13096|17 13191|17 13096|67555 13191|115331 14210|591 14221|307322 14347|206342 16809|17 16809|296791 14347|17 14347|132070 19571|22 19571|474 19572|21 19572|3 19618|243 19620|185 19637|0 19645|84364 19646|53520 19646|22776 19645|17 19645|19 19645|19 19645|17 19645|55146 21447|17 16809|33398 21447|20286 19646|13187 16809|0 22214|0 22219|0 22261|359 22277|1421 22361|22 22364|18 22371|17 19646|17 19646|6186 22517|22 It does look like you're getting duplicates that would be causing the difference in values, but I'm not sure exactly why that would be. Do you have jobs that get requeued? I would like to investigate one of these jobs, like 19646. Can you send the output of 'sacct -l -j 19646' along with the slurmctld logs that cover the time this job was on the system? Thanks, Ben I wanted to follow up and see if you're able to provide some additional detail about one of these jobs that has multiple entries. Can you send the output of 'sacct -l -j 19646' along with the slurmctld logs that cover the time this job was on the system? Thanks, Ben Hi Ben,
Is this enough information?
sacct -l -j 19646
JobID JobIDRaw JobName Partition MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU NTasks AllocCPUS Elapsed State ExitCode AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqMem ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite AllocGRES ReqGRES ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutAve TRESUsageOutTot
------------ ------------ ---------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- ---------- -------------- ------------ --------------- --------------- -------------- ------------ ---------------- ---------------- -------------- ------------ ------------ ---------- ---------- -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- ------------------- ------------------- --------------- ---------------
19646 19646 spleen_mx+ G-8GPU-64+ 64 1-03:15:39 CANCELLED+ 0:0 Unknown Unknown Unknown 7368Mc gpu:8 PER_NODE:gp+ billing=2+ billing=6+
19646.batch 19646.batch batch 204714644K spcd-usw2-007+ 0 204714644K 16584876K spcd-usw2+ 0 16584876K 51682 spcd-usw2-0+ 0 51682 2-18:14:06 spcd-usw2+ 0 2-18:14:06 1 64 1-03:15:42 CANCELLED 0:15 1K 0 0 0 7368Mc 0 2086067.52M spcd-usw2-00756 0 2086067.52M 1267.40M spcd-usw2-00756 0 1267.40M gpu:8 PER_NODE:gp+ cpu=64,me+ cpu=2-18:14:0+ cpu=2-18:14:0+ cpu=spcd-usw2-007+ cpu=0,fs/disk=0,m+ cpu=2-18:14:0+ cpu=spcd-usw2-007+ cpu=0,fs/disk=0,m+ cpu=2-18:14:0+ energy=0,fs/di+ energy=spcd-usw2-0+ fs/disk=0 energy=0,fs/di+ energy=0,fs/di+
grep 19646 slurmctld.log
[2021-03-24T07:14:38.927] _slurm_rpc_submit_batch_job: JobId=19646 InitPrio=4294900443 usec=393
[2021-03-24T15:37:21.935] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-00758
[2021-03-24T15:37:35.973] prolog_running_decr: Configuration for JobId=19646 is complete
[2021-03-25T06:29:21.209] requeue job JobId=19646 due to failure of node spcd-usw2-00758
[2021-03-25T06:29:21.212] Requeuing JobId=19646
[2021-03-25T06:31:25.285] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-01277
[2021-03-25T06:38:20.770] job_time_limit: Configuration for JobId=19646 complete
[2021-03-25T06:38:20.770] Resetting JobId=19646 start time for node power up
[2021-03-25T12:57:56.792] requeue job JobId=19646 due to failure of node spcd-usw2-01277
[2021-03-25T12:57:56.794] Requeuing JobId=19646
[2021-03-25T16:36:27.696] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-01277
[2021-03-25T16:42:50.174] job_time_limit: Configuration for JobId=19646 complete
[2021-03-25T16:42:50.174] Resetting JobId=19646 start time for node power up
[2021-03-25T20:22:37.346] requeue job JobId=19646 due to failure of node spcd-usw2-01277
[2021-03-25T20:22:37.348] Requeuing JobId=19646
[2021-03-25T20:24:42.464] sched: Allocate JobId=19646 NodeList=spcd-usw2-01780 #CPUs=64 Partition=G-8GPU-64Cpu-471GB
[2021-03-25T20:24:59.747] requeue job JobId=19646 due to failure of node spcd-usw2-01780
[2021-03-25T20:24:59.750] Requeuing JobId=19646
[2021-03-25T20:25:00.569] _slurm_rpc_requeue: 19646: Job is pending execution
[2021-03-25T20:25:00.593] sched: _update_job: setting features to us-west-2a for JobId=19646
[2021-03-25T20:25:00.594] _slurm_rpc_update_job: complete JobId=19646 uid=0 usec=12045
[2021-03-25T20:27:28.342] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-00756
[2021-03-25T20:34:20.950] job_time_limit: Configuration for JobId=19646 complete
[2021-03-25T20:34:20.950] Resetting JobId=19646 start time for node power up
[2021-03-26T04:36:16.439] requeue job JobId=19646 due to failure of node spcd-usw2-00756
[2021-03-26T04:36:16.442] Requeuing JobId=19646
[2021-03-26T04:38:18.862] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-00760
[2021-03-26T04:44:51.970] job_time_limit: Configuration for JobId=19646 complete
[2021-03-26T04:44:51.970] Resetting JobId=19646 start time for node power up
[2021-03-26T15:02:48.368] requeue job JobId=19646 due to failure of node spcd-usw2-00760
[2021-03-26T15:02:48.371] Requeuing JobId=19646
[2021-03-26T15:04:49.874] backfill: Started JobId=19646 in G-8GPU-64Cpu-471GB on spcd-usw2-00756
[2021-03-26T15:11:21.952] job_time_limit: Configuration for JobId=19646 complete
[2021-03-26T15:11:21.952] Resetting JobId=19646 start time for node power up
[2021-03-27T18:27:00.784] _slurm_rpc_kill_job: REQUEST_KILL_JOB JobId=19646 uid 93422
Thanks for sending that information. It looks like what is happening is that this job tried to start several times, but then after approximately 6 or 7 minutes if the node hadn't come up correctly, the job was requeued and it tried to start again later. Then at 2021-03-26T15:11:21.952 the job was successfully started and ran until the job was cancelled. The previous attempts to start resulted in usage time being recorded in the database, but since the node didn't come up it didn't result anything being billed by AWS and the record that actually had usage is what shows up without displaying the duplicate records. Please let me know if you have questions about this. Thanks, Ben Hi Jimmy, I wanted to follow up and make sure the information about the duplicate jobs I sent made sense. Let me know if you have any additional questions about this. Thanks, Ben Hi Ben, I think we got a point forward with sacct. Thanks for the clarification. I'm glad to hear that this helped explain what was happening. I'll go ahead and close this ticket. Let us know if there's anything else we can do to help. Thanks, Ben |
Hi, We have enable wckey to track usage and when we run the following report. It show the user xxx has used 680712 min (472.716667 days) in 19 days. Are reading this report correctly? When I execute sacct on the same user for the same time frame the sum of the elapsed time is about "39472" minutes. Can someone please take a look at let me know if I am reading the sreport wrong? sreport -t minutes cluster WCKeyUtilizationByUser start=3/1/21 end=3/30/21 -------------------------------------------------------------------------------- Cluster/WCKey/User Utilization 2021-03-01T00:00:00 - 2021-03-19T22:59:59 (1638000 secs) Usage reported in CPU Minutes -------------------------------------------------------------------------------- Cluster WCKey Login Proper Name Used --------- --------------- --------- --------------- -------- slurm-ma+ p3.16xlarge xxxx xxxx 680712 sacct -X -u xxxx --starttime=2021-03-01 --endtime=2021-03-31 -o start,end,elapsed Start End Elapsed ------------------- ------------------- ---------- 2021-02-28T11:05:39 2021-03-02T00:50:11 1-13:44:32 2021-03-03T04:58:18 2021-03-03T04:59:26 00:01:08 2021-03-03T05:38:18 2021-03-03T05:39:20 00:01:02 2021-03-04T08:24:50 2021-03-04T16:55:44 08:30:54 2021-03-05T04:13:22 2021-03-05T04:14:20 00:00:58 2021-03-05T04:21:11 2021-03-05T04:22:52 00:01:41 2021-03-05T17:28:22 2021-03-06T17:24:38 23:56:16 2021-03-06T17:24:49 2021-03-07T18:13:46 1-00:48:57 2021-03-06T17:41:56 2021-03-07T18:13:50 1-00:31:54 2021-03-06T17:41:56 2021-03-07T18:13:53 1-00:31:57 2021-03-07T18:14:02 2021-03-08T15:14:54 21:00:52 2021-03-07T18:14:13 2021-03-08T15:14:57 21:00:44 2021-03-07T18:14:38 2021-03-08T15:15:03 21:00:25 2021-03-08T15:15:09 2021-03-10T01:45:04 1-10:29:55 2021-03-08T15:15:21 2021-03-10T02:04:32 1-10:49:11 2021-03-08T15:15:27 2021-03-10T02:03:46 1-10:48:19 2021-03-11T05:48:57 2021-03-11T05:49:15 00:00:18 2021-03-11T05:49:54 2021-03-11T05:56:38 00:06:44 2021-03-11T06:14:44 2021-03-11T06:19:01 00:04:17 2021-03-11T06:25:23 2021-03-11T06:26:36 00:01:13 2021-03-11T06:28:27 2021-03-11T06:29:41 00:01:14 2021-03-11T06:37:23 2021-03-11T06:38:36 00:01:13 2021-03-11T06:40:39 2021-03-11T06:44:22 00:03:43 2021-03-11T06:52:09 2021-03-11T06:56:05 00:03:56 2021-03-11T07:23:38 2021-03-11T07:25:34 00:01:56 2021-03-11T18:04:08 2021-03-11T18:05:32 00:01:24 2021-03-11T18:13:46 2021-03-11T18:14:28 00:00:42 2021-03-11T18:23:27 2021-03-11T18:43:24 00:19:57 2021-03-11T20:23:38 2021-03-11T20:44:31 00:20:53 2021-03-12T02:45:39 2021-03-12T04:36:59 01:51:20 2021-03-12T04:37:20 2021-03-12T04:39:16 00:01:56 2021-03-12T08:34:41 2021-03-14T15:40:42 2-07:06:01 2021-03-12T20:38:14 2021-03-13T19:08:05 22:29:51 2021-03-13T02:09:45 2021-03-13T18:34:15 16:24:30 2021-03-13T19:09:27 2021-03-16T03:01:16 2-07:51:49 2021-03-16T06:38:59 2021-03-18T15:04:56 2-08:25:57 2021-03-16T06:44:59 2021-03-17T07:48:55 1-01:03:56 2021-03-16T06:46:59 2021-03-16T15:34:01 08:47:02 2021-03-17T00:59:32 2021-03-18T01:43:23 1-00:43:51 2021-03-17T07:49:19 2021-03-18T09:14:36 1-01:25:17 2021-03-18T06:36:06 2021-03-19T01:34:20 18:58:14 2021-03-18T18:46:07 2021-03-19T01:34:11 06:48:04 2021-03-18T21:00:07 Unknown 1-01:51:58 2021-03-19T04:39:38 2021-03-19T04:40:50 00:01:12 2021-03-19T05:30:08 2021-03-19T06:14:01 00:43:53 2021-03-19T21:56:09 Unknown 00:55:56 2021-03-19T21:56:09 Unknown 00:55:56