Ticket 2076

Summary: sacct reports jobs outside of specified time range
Product: Slurm Reporter: Steven Shortino <steven.shortino>
Component: AccountingAssignee: Danny Auble <da>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex, brian, da, jamal.uddin, tim
Version: 14.11.6   
Hardware: Linux   
OS: Linux   
Site: DANA Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 15.08.3 16.05.0-pre1 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Steven Shortino 2015-10-28 06:43:47 MDT
We are trying to generate a usage report based off of an sacct command, and found odd behaviour that reported jobs clearly outside of the range requested from slurm.

The command we used: 

sacct -X -P -a -n -o jobid,submit,start,end,cputimeraw -S 2015-09-15 -E 2015-09-20

Most of the records drawn from the database appear perfectly correct, but there are seven records at the end that are obviously not:


2068|2015-10-09T09:51:52|2015-10-09T09:55:22|2015-10-09T09:55:22|0
2069|2015-10-09T09:52:53|2015-10-09T09:55:13|2015-10-09T09:55:13|0
2070|2015-10-09T09:53:13|2015-10-09T09:55:06|2015-10-09T09:55:06|0
2079|2015-10-09T11:26:34|2015-10-09T12:32:56|2015-10-09T12:32:56|0
2082|2015-10-09T12:34:07|2015-10-09T15:52:55|2015-10-09T15:52:55|0
2083|2015-10-09T12:34:33|2015-10-09T15:52:47|2015-10-09T15:52:47|0
2161|2015-10-13T17:29:30|2015-10-14T07:56:46|2015-10-14T07:56:46|0

All of these jobs show as being cancelled.  Why is sacct pulling them?

Best regards,

Steve Shortino
Comment 1 Danny Auble 2015-10-28 06:58:51 MDT
Those do appear to be out of the range.  Could you send me the output of

select t1.job_db_inx, t1.id_job, t1.state, t1.time_submit, t1.time_eligible, t1.time_start, t1.time_end from snowflake_job_table as t1 left join snowflake_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join snowflake_resv_table as t3  on t1.id_resv=t3.id_resv  where ((t1.time_eligible < 1442732400 && (t1.time_end >= 1442300400 || t1.time_end = 0))) group by id_job, time_submit desc;

and...

select * from snowflake_job_table where id_job=2161;

You will need to replace "snowflake" with the name of your cluster.

On a related point I would suggest using sreport instead of sacct to generate reports as it uses rolled up data and should be much faster.  If you must use sacct you will probably want it to truncate the time to only include time from the period requested.  This can be done with the -T option.
Comment 2 Steven Shortino 2015-10-28 07:43:27 MDT
(In reply to Danny Auble from comment #1)
> Those do appear to be out of the range.  Could you send me the output of
> 
> select t1.job_db_inx, t1.id_job, t1.state, t1.time_submit, t1.time_eligible,
> t1.time_start, t1.time_end from snowflake_job_table as t1 left join
> snowflake_assoc_table as t2 on t1.id_assoc=t2.id_assoc left join
> snowflake_resv_table as t3  on t1.id_resv=t3.id_resv  where
> ((t1.time_eligible < 1442732400 && (t1.time_end >= 1442300400 || t1.time_end
> = 0))) group by id_job, time_submit desc;
> 
> and...
> 
> select * from snowflake_job_table where id_job=2161;
> 
> You will need to replace "snowflake" with the name of your cluster.
> 
> On a related point I would suggest using sreport instead of sacct to
> generate reports as it uses rolled up data and should be much faster.  If
> you must use sacct you will probably want it to truncate the time to only
> include time from the period requested.  This can be done with the -T option.

Here's the requested output: mysql> select * from slurm_cluster_job_table where id_job=2161;
+------------+----------+---------+---------+----------------+-----------------+--------------------+----------+------------+------------+------------+-----------+----------------------+----------+--------------+---------------+----------+--------+--------+---------+----------+---------+----------+-------------+---------+---------------+-------------+----------+-----------+----------+-------+------------+-------------+---------------+------------+------------+----------------+----------+------------+-----------+-------+-------------+
| job_db_inx | mod_time | deleted | account | array_task_str | array_max_tasks | array_task_pending | cpus_req | cpus_alloc | derived_ec | derived_es | exit_code | job_name             | id_assoc | id_array_job | id_array_task | id_block | id_job | id_qos | id_resv | id_wckey | id_user | id_group | kill_requid | mem_req | nodelist      | nodes_alloc | node_inx | partition | priority | state | timelimit  | time_submit | time_eligible | time_start | time_end   | time_suspended | gres_req | gres_alloc | gres_used | wckey | track_steps |
+------------+----------+---------+---------+----------------+-----------------+--------------------+----------+------------+------------+------------+-----------+----------------------+----------+--------------+---------------+----------+--------+--------+---------+----------+---------+----------+-------------+---------+---------------+-------------+----------+-----------+----------+-------+------------+-------------+---------------+------------+------------+----------------+----------+------------+-----------+-------+-------------+
|       3210 |        0 |       0 | NULL    | NULL           |               0 |                  0 |        5 |        120 |          0 | NULL       |         0 | SR-19532_cvj-3-fixed |        0 |            0 |    4294967294 | NULL     |   2161 |      1 |       0 |        0 |   34308 |    32772 |       34308 |       0 | None assigned |           0 | NULL     | defq      |     5125 |     4 | 4294967294 |  1444771770 |             0 | 1444823806 | 1444823806 |              0 |          |            |           |       |           0 |
+------------+----------+---------+---------+----------------+-----------------+--------------------+----------+------------+------------+------------+-----------+----------------------+----------+--------------+---------------+----------+--------+--------+---------+----------+---------+----------+-------------+---------+---------------+-------------+----------+-----------+----------+-------+------------+-------------+---------------+------------+------------+----------------+----------+------------+-----------+-------+-------------+
1 row in set (0.00 sec)


+------------+--------+-------+-------------+---------------+------------+------------+
| job_db_inx | id_job | state | time_submit | time_eligible | time_start | time_end   |
+------------+--------+-------+-------------+---------------+------------+------------+
|       2037 |   1353 |     3 |  1442316120 |    1442316120 | 1442316121 | 1442316771 |
|       2038 |   1354 |     4 |  1442323370 |    1442323370 | 1442323371 | 1442327753 |
|       2039 |   1355 |     3 |  1442323542 |    1442323542 | 1442323543 | 1442356211 |
|       2041 |   1356 |     3 |  1442324693 |    1442324693 | 1442324694 | 1442356185 |
|       2043 |   1357 |     3 |  1442326240 |    1442326240 | 1442326241 | 1442358072 |
|       2044 |   1358 |     5 |  1442327925 |    1442327925 | 1442327925 | 1442327927 |
|       2046 |   1359 |     5 |  1442329583 |    1442329583 | 1442329583 | 1442329585 |
|       2048 |   1360 |     5 |  1442329675 |    1442329675 | 1442329675 | 1442329677 |
|       2049 |   1361 |     4 |  1442330118 |    1442330118 | 1442330119 | 1442491506 |
|       2051 |   1362 |     3 |  1442335230 |    1442335230 | 1442335230 | 1442339808 |
|       2053 |   1363 |     4 |  1442346845 |    1442346845 | 1442346846 | 1442346853 |
|       2055 |   1364 |     3 |  1442346858 |    1442346858 | 1442346858 | 1442346866 |
|       2057 |   1365 |     3 |  1442346928 |    1442346928 | 1442346929 | 1442351697 |
|       2059 |   1366 |     3 |  1442348636 |    1442348636 | 1442348636 | 1442359771 |
|       2060 |   1367 |     3 |  1442377574 |    1442377574 | 1442377574 | 1442395214 |
|       2062 |   1368 |     3 |  1442405728 |    1442405728 | 1442405729 | 1442406243 |
|       2063 |   1369 |     3 |  1442407528 |    1442407528 | 1442407529 | 1442412400 |
|       2064 |   1370 |     3 |  1442408419 |    1442408419 | 1442408419 | 1442420148 |
|       2065 |   1371 |     3 |  1442413024 |    1442413024 | 1442413025 | 1442417558 |
|       2067 |   1372 |     3 |  1442417925 |    1442417925 | 1442417926 | 1442422248 |
|       2069 |   1373 |     3 |  1442422875 |    1442422875 | 1442422876 | 1442427117 |
|       2071 |   1374 |     5 |  1442425961 |    1442425961 | 1442425961 | 1442425979 |
|       2073 |   1375 |     5 |  1442426460 |    1442426460 | 1442426460 | 1442426477 |
|       2075 |   1376 |     5 |  1442427335 |    1442427335 | 1442427335 | 1442427352 |
|       2076 |   1377 |     3 |  1442431887 |    1442431887 | 1442431887 | 1442458201 |
|       2078 |   1378 |     3 |  1442431918 |    1442431918 | 1442431918 | 1442456909 |
|       2080 |   1379 |     3 |  1442431947 |    1442431947 | 1442431948 | 1442457011 |
|       2082 |   1380 |     3 |  1442431990 |    1442431990 | 1442431991 | 1442457103 |
|       2083 |   1381 |     4 |  1442432901 |    1442432901 | 1442432901 | 1442433021 |
|       2085 |   1382 |     3 |  1442434371 |    1442434371 | 1442434372 | 1442464087 |
|       2087 |   1383 |     5 |  1442434383 |    1442434383 | 1442434383 | 1442434393 |
|       2089 |   1384 |     3 |  1442434403 |    1442434403 | 1442434404 | 1442469048 |
|       2091 |   1385 |     3 |  1442434445 |    1442434445 | 1442456909 | 1442489542 |
|       2092 |   1386 |     3 |  1442434449 |    1442434449 | 1442457011 | 1442459848 |
|       2093 |   1387 |     3 |  1442434655 |    1442434655 | 1442457103 | 1442488347 |
|       2094 |   1388 |     3 |  1442434727 |    1442434727 | 1442458201 | 1442495499 |
|       2095 |   1389 |     4 |  1442434758 |    1442434758 | 1442434843 | 1442434843 |
|       2096 |   1390 |     5 |  1442434860 |    1442434860 | 1442459848 | 1442459987 |
|       2097 |   1391 |     3 |  1442434998 |    1442434998 | 1442459987 | 1442493032 |
|       2098 |   1392 |     3 |  1442435021 |    1442435021 | 1442464087 | 1442495477 |
|       2099 |   1393 |     4 |  1442435387 |    1442435387 | 1442469048 | 1442489150 |
|       2100 |   1394 |     3 |  1442435523 |    1442435523 | 1442488347 | 1442519802 |
|       2101 |   1395 |     3 |  1442462433 |    1442462433 | 1442489324 | 1442489798 |
|       2102 |   1396 |     3 |  1442462770 |    1442462770 | 1442489324 | 1442489827 |
|       2103 |   1397 |     3 |  1442463101 |    1442463101 | 1442489324 | 1442489863 |
|       2104 |   1398 |     3 |  1442487823 |    1442487823 | 1442489180 | 1442489324 |
|       2105 |   1399 |     3 |  1442489204 |    1442489204 | 1442489542 | 1442523796 |
|       2106 |   1400 |     3 |  1442489709 |    1442489709 | 1442489863 | 1442519642 |
|       2107 |   1401 |     3 |  1442489921 |    1442489921 | 1442491508 | 1442492159 |
|       2108 |   1402 |     3 |  1442490626 |    1442490626 | 1442491508 | 1442494926 |
|       2109 |   1403 |     3 |  1442498376 |    1442498376 | 1442498377 | 1442516503 |
|       2110 |   1404 |     3 |  1442504021 |    1442504021 | 1442504022 | 1442507886 |
|       2111 |   1405 |     3 |  1442505180 |    1442505180 | 1442505181 | 1442505183 |
|       2113 |   1406 |     3 |  1442505349 |    1442505349 | 1442505349 | 1442509207 |
|       2115 |   1407 |     3 |  1442509302 |    1442509302 | 1442509302 | 1442512410 |
|       2117 |   1408 |     3 |  1442514640 |    1442514640 | 1442514642 | 1442514734 |
|       2119 |   1409 |     3 |  1442514760 |    1442514760 | 1442514762 | 1442517890 |
|       2121 |   1410 |     3 |  1442515691 |    1442515691 | 1442515691 | 1442517538 |
|       2123 |   1411 |     4 |  1442517423 |    1442517423 | 1442517424 | 1442517619 |
|       2124 |   1412 |     4 |  1442517664 |    1442517664 | 1442517665 | 1442517778 |
|       2126 |   1413 |     4 |  1442517799 |    1442517799 | 1442517799 | 1442517894 |
|       2127 |   1414 |     4 |  1442517918 |    1442517918 | 1442517918 | 1442578117 |
|       2128 |   1415 |     3 |  1442518647 |    1442518647 | 1442518647 | 1442523839 |
|       2130 |   1416 |     3 |  1442518650 |    1442518650 | 1442518650 | 1442524901 |
|       2132 |   1417 |     3 |  1442518908 |    1442518908 | 1442518909 | 1442552361 |
|       2133 |   1418 |     4 |  1442519032 |    1442519032 | 1442519204 | 1442519204 |
|       2134 |   1419 |     4 |  1442519035 |    1442519035 | 1442519206 | 1442519206 |
|       2135 |   1420 |     3 |  1442519491 |    1442519491 | 1442519642 | 1442533257 |
|       2136 |   1421 |     3 |  1442519494 |    1442519494 | 1442519802 | 1442535818 |
|       2137 |   1422 |     3 |  1442519740 |    1442519740 | 1442523796 | 1442537875 |
|       2138 |   1423 |     3 |  1442519745 |    1442519745 | 1442523839 | 1442574089 |
|       2139 |   1424 |     5 |  1442519933 |    1442519933 | 1442524901 | 1442525423 |
|       2140 |   1425 |     3 |  1442519976 |    1442519976 | 1442525424 | 1442526045 |
|       2141 |   1426 |     3 |  1442519980 |    1442519980 | 1442526045 | 1442527209 |
|       2142 |   1427 |     3 |  1442523781 |    1442523781 | 1442527210 | 1442548663 |
|       2143 |   1428 |     3 |  1442523786 |    1442523786 | 1442533257 | 1442557567 |
|       2144 |   1429 |     3 |  1442523791 |    1442523791 | 1442535818 | 1442542572 |
|       2145 |   1430 |     3 |  1442578803 |    1442578803 | 1442578803 | 1442606434 |
|       2147 |   1431 |     3 |  1442579394 |    1442579394 | 1442579394 | 1442579413 |
|       2149 |   1432 |     3 |  1442579495 |    1442579495 | 1442579495 | 1442580404 |
|       2151 |   1433 |     4 |  1442579588 |    1442579588 | 1442579589 | 1442586830 |
|       2153 |   1434 |     3 |  1442583039 |    1442583039 | 1442583039 | 1442583060 |
|       2155 |   1435 |     3 |  1442586051 |    1442586051 | 1442586052 | 1442589233 |
|       2156 |   1436 |     3 |  1442586146 |    1442586146 | 1442586146 | 1442586377 |
|       2158 |   1437 |     3 |  1442586402 |    1442586402 | 1442586404 | 1442588603 |
|       2159 |   1438 |     3 |  1442587082 |    1442587082 | 1442587083 | 1442587969 |
|       2161 |   1439 |     5 |  1442589959 |    1442589959 | 1442589960 | 1442589963 |
|       2163 |   1440 |     4 |  1442590134 |    1442590134 | 1442590134 | 1442604886 |
|       2165 |   1441 |     5 |  1442594407 |    1442594407 | 1442594407 | 1442594410 |
|       2166 |   1442 |     4 |  1442594630 |    1442594630 | 1442594631 | 1442594766 |
|       2168 |   1443 |     3 |  1442594763 |    1442594763 | 1442594764 | 1442596075 |
|       2170 |   1444 |     4 |  1442597706 |    1442597706 | 1442597707 | 1442597784 |
|       2171 |   1445 |     4 |  1442598023 |    1442598023 | 1442598023 | 1442598049 |
|       2173 |   1446 |     4 |  1442598098 |    1442598098 | 1442598098 | 1442598152 |
|       2175 |   1447 |     4 |  1442598125 |    1442598125 | 1442598126 | 1442598600 |
|       2177 |   1448 |     3 |  1442598220 |    1442598220 | 1442598220 | 1442598990 |
|       2179 |   1449 |     4 |  1442598860 |    1442598860 | 1442598860 | 1442599055 |
|       2181 |   1450 |     4 |  1442599291 |    1442599291 | 1442599291 | 1442599807 |
|       2183 |   1451 |     4 |  1442599915 |    1442599915 | 1442599915 | 1442600383 |
|       2185 |   1452 |     4 |  1442599958 |    1442599958 | 1442599959 | 1442600595 |
|       2187 |   1453 |     4 |  1442600434 |    1442600434 | 1442600435 | 1442601235 |
|       2189 |   1454 |     4 |  1442600685 |    1442600685 | 1442600686 | 1442601273 |
|       2191 |   1455 |     3 |  1442601334 |    1442601334 | 1442601335 | 1442602104 |
|       2193 |   1456 |     4 |  1442601394 |    1442601394 | 1442601395 | 1442601819 |
|       2195 |   1457 |     4 |  1442601883 |    1442601883 | 1442601884 | 1442602023 |
|       2196 |   1458 |     3 |  1442602979 |    1442602979 | 1442602980 | 1442603762 |
|       2198 |   1459 |     3 |  1442603091 |    1442603091 | 1442603091 | 1442603184 |
|       2200 |   1460 |     5 |  1442603314 |    1442603314 | 1442603314 | 1442606691 |
|       2202 |   1461 |     5 |  1442604014 |    1442604014 | 1442604015 | 1442604017 |
|       2204 |   1462 |     4 |  1442604067 |    1442604067 | 1442604067 | 1442851759 |
|       2206 |   1463 |     5 |  1442604484 |    1442604484 | 1442604484 | 1442605084 |
|       2208 |   1464 |     4 |  1442604992 |    1442604992 | 1442604992 | 1442851739 |
|       2209 |   1465 |     3 |  1442605203 |    1442605203 | 1442605203 | 1442605979 |
|       2210 |   1466 |     4 |  1442605478 |    1442605478 | 1442605479 | 1442851774 |
|       2211 |   1467 |     3 |  1442606051 |    1442606051 | 1442606052 | 1442608663 |
|       2213 |   1468 |     3 |  1442606914 |    1442606914 | 1442606914 | 1442608218 |
|       2214 |   1469 |     3 |  1442611928 |    1442611928 | 1442611928 | 1442612542 |
|       2215 |   1470 |     3 |  1442615879 |    1442615879 | 1442615880 | 1442618037 |
|       2216 |   1471 |     3 |  1442625816 |    1442625816 | 1442625816 | 1442646561 |
|       2218 |   1472 |     3 |  1442672231 |    1442672231 | 1442672232 | 1442695840 |
|       2220 |   1473 |     3 |  1442672261 |    1442672261 | 1442672264 | 1442690345 |
|       2221 |   1474 |     3 |  1442684139 |    1442684139 | 1442684139 | 1442684789 |
|       3074 |   2068 |     4 |  1444398712 |             0 | 1444398922 | 1444398922 |
|       3075 |   2069 |     4 |  1444398773 |             0 | 1444398913 | 1444398913 |
|       3076 |   2070 |     4 |  1444398793 |             0 | 1444398906 | 1444398906 |
|       3087 |   2079 |     4 |  1444404394 |             0 | 1444408376 | 1444408376 |
|       3092 |   2082 |     4 |  1444408447 |             0 | 1444420375 | 1444420375 |
|       3093 |   2083 |     4 |  1444408473 |             0 | 1444420367 | 1444420367 |
|       3210 |   2161 |     4 |  1444771770 |             0 | 1444823806 | 1444823806 |
+------------+--------+-------+-------------+---------------+------------+------------+
Comment 3 Danny Auble 2015-10-28 11:34:07 MDT
Thanks Steven, I think I see the issue here.  It looks like the jobs in question were never eligible, probably from a dependency or a future start time or something, and then cancelled before they became eligible.  To fix your situation you might want to add the state option -sr which will give you only jobs running during that time.

See if 

sacct -X -P -a -n -o jobid,submit,start,end,cputimeraw -S 2015-09-15 -E 2015-09-20 -T -sr

gives you something better.  I'll see what can be done for this to happen by default.
Comment 4 Steven Shortino 2015-10-29 03:56:10 MDT
This has resolved our problem.  Thanks!
Comment 5 Danny Auble 2015-10-29 10:33:17 MDT
Steven, sacct will not display these by default when a window of time is given with commit d6d26b0213f.

I am glad the other fix fixed this for you though, it is probably what you wanted in the first place :).

Let us know if there is anything else you need on this.