Ticket 5342

Summary:	sql query reqults in "BIGINT UNSIGNED value is out of range"
Product:	Slurm	Reporter:	Dirk Petersen <petersen>
Component:	Database	Assignee:	Felip Moll <felip.moll>
Status:	RESOLVED INFOGIVEN	QA Contact:
Severity:	4 - Minor Issue
Priority:	---	CC:	felip.moll
Version:	17.11.5
Hardware:	Linux
OS:	Linux
Site:	FHCRC - Fred Hutchinson Cancer Research Center	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Dirk Petersen 2018-06-22 11:32:21 MDT

I would like to run this query to get the CPU hours used per acccount. This has worked in the past but not now. We upgraded to the latest slurm a few months ago but I am not sure if the upgrade was the root cause. Please see below. 


mysql> SELECT
    ->   beagle_job_table.account,
    ->   sum((beagle_job_table.time_end - beagle_job_table.time_start) * beagle_job_table.cpus_req / 3600) AS CPUh
    -> FROM
    ->   beagle_job_table
    -> WHERE
    ->   (beagle_job_table.time_start > 0 AND
    ->   beagle_job_table.time_end > 0 AND
    ->   (beagle_job_table.time_end > unix_timestamp('2017-07-01 00:00:00') AND
    ->   beagle_job_table.time_end < unix_timestamp('2018-07-01 00:00:00')))
    -> GROUP BY
    ->   beagle_job_table.account
    -> ORDER BY
    ->   CPUh DESC
    -> 
    -> ;
ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`slurm_acct_db`.`beagle_job_table`.`time_end` - `slurm_acct_db`.`beagle_job_table`.`time_start`)'
mysql> 
mysql> SELECT
    ->   beagle_job_table.account,
    ->   sum((beagle_job_table.time_end - beagle_job_table.time_start) * beagle_job_table.cpus_req / 3600) AS CPUh
    -> FROM
    ->   beagle_job_table
    -> 
    -> GROUP BY
    ->   beagle_job_table.account
    -> ORDER BY
    ->   CPUh DESC
    -> ;
ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`slurm_acct_db`.`beagle_job_table`.`time_end` - `slurm_acct_db`.`beagle_job_table`.`time_start`)'
mysql>

Comment 1 Felip Moll 2018-06-25 08:42:37 MDT

Hi Dirk,

It doesn't seem an Slurm issue but a MySQL one, for example when upgrading MySQL from 5.0.x to 5.5.x.

I see that when dealing with unix time, mysql can throw this kind of errors:

https://stackoverflow.com/questions/5605085/bigint-unsigned-value-is-out-of-range
https://stackoverflow.com/questions/11698613/bigint-unsigned-value-is-out-of-range-my-sql/11704174

"As of MySQL 5.5.5, overflow during numeric expression evaluation results in an error."

Try to apply the suggested cast and see if it works.


In Slurm side, it is internally just asking for the time and doing the calculations in their code, thus not relying in these mysql functions.

Note you have sreport for getting this results:

sreport cluster UserUtilizationByAccount -P -t hours start=2017-01-01 tree

Comment 2 Dirk Petersen 2018-06-25 15:11:35 MDT

Thanks Felip, 

the cast workaround is doing it for me. this works just fine. 

I tried sreport which seems to be better but unfortunately "sreport cluster UserUtilizationByAccount" does not support filtering for partitions. We support a condo model and are only reporting on the public resources 

Dirk

SELECT
  beagle_job_table.account,
  sum((cast(beagle_job_table.time_end as signed) - cast(beagle_job_table.time_start as signed)) * cast(beagle_job_table.cpus_req as signed) / 3600) AS CPUh
FROM
  beagle_job_table
GROUP BY
  beagle_job_table.account
ORDER BY
  CPUh DESC

Comment 3 Dirk Petersen 2018-06-25 15:37:49 MDT

Felip, one more question and not sure if this is fully related: 

when I run the query below against our account DB with 22M jobs in it it takes many hours to finish and sometimes does not finish at all. mysqld and slurmdbd use quite a bit of memory mysqld is running high on one CPU. Could there be better DB indices ?


  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                   
20880 slurm     20   0 4551176 3.439g   4196 S   0.0  5.5 146:09.34 slurmdbd                                                                  
14507 mysql     20   0 3768996 1.419g  19268 S 100.0  2.3   1150:45 mysqld                                                                    



SELECT
  gizmo_job_table.account,
  gizmo_job_table.id_user,
  Sum((CAST(gizmo_job_table.time_end AS signed) - CAST(gizmo_job_table.time_start AS signed)) *
  CAST(gizmo_job_table.cpus_req AS signed) / 3600) AS CPUh
FROM
  gizmo_job_table
WHERE
  gizmo_job_table.time_end > 0 AND
  gizmo_job_table.time_start > 0 AND
  gizmo_job_table.time_end > Unix_Timestamp('2017-07-01 00:00:00') AND
  (gizmo_job_table.partition = 'campus' OR
  gizmo_job_table.partition = 'largenode')
GROUP BY
  gizmo_job_table.account,
  gizmo_job_table.id_user
ORDER BY
  CPUh DESC

Comment 4 Felip Moll 2018-06-26 03:14:32 MDT

(In reply to Dirk Petersen from comment #3)
> Felip, one more question and not sure if this is fully related: 
> 
> when I run the query below against our account DB with 22M jobs in it it
> takes many hours to finish and sometimes does not finish at all. mysqld and
> slurmdbd use quite a bit of memory mysqld is running high on one CPU. Could
> there be better DB indices ?
> 
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
> 
> 20880 slurm     20   0 4551176 3.439g   4196 S   0.0  5.5 146:09.34 slurmdbd
> 
> 14507 mysql     20   0 3768996 1.419g  19268 S 100.0  2.3   1150:45 mysqld  
> 
> 
> 
> 
> SELECT
>   gizmo_job_table.account,
>   gizmo_job_table.id_user,
>   Sum((CAST(gizmo_job_table.time_end AS signed) -
> CAST(gizmo_job_table.time_start AS signed)) *
>   CAST(gizmo_job_table.cpus_req AS signed) / 3600) AS CPUh
> FROM
>   gizmo_job_table
> WHERE
>   gizmo_job_table.time_end > 0 AND
>   gizmo_job_table.time_start > 0 AND
>   gizmo_job_table.time_end > Unix_Timestamp('2017-07-01 00:00:00') AND
>   (gizmo_job_table.partition = 'campus' OR
>   gizmo_job_table.partition = 'largenode')
> GROUP BY
>   gizmo_job_table.account,
>   gizmo_job_table.id_user
> ORDER BY
>   CPUh DESC

22M of jobs is quite a lot, I assume this corresponds to all the jobs you have since cluster creation.
Do you really need *all* the information of each job from the beggining?
If you can, I recommend to purge/archive the jobs and do the queries over the usage tables instead of the job ones, this will improve slurmdbd performance and it is what I highly recommend.

If this is not the case, I think it is more a SQL issue of querying on a table with millions of rows.

Doing an EXPLAIN over the query shows me:
+------+-------------+--------------------+------+---------------+------+---------+------+------+----------------------------------------------+
| id   | select_type | table              | type | possible_keys | key  | key_len | ref  | rows | Extra                                        |
+------+-------------+--------------------+------+---------------+------+---------+------+------+----------------------------------------------+
|    1 | SIMPLE      | llagosti_job_table | ALL  | rollup2       | NULL | NULL    | NULL |    2 | Using where; Using temporary; Using filesort |
+------+-------------+--------------------+------+---------------+------+---------+------+------+----------------------------------------------+

"using filesort" means that the sort can't be performed with an index since you're ordering by a computed field (CPUh).
Temporary tables can be created if there is an ORDER BY clause and a different GROUP BY clause.

So using an index here won't entirely solve the problem.

If you try to run the query without the Order By what times do you get?
And without Group by?

I think you would need to read through some documentation on how to optimize the queries or tune the database to get better results.
i.e.:
https://www.percona.com/blog/2009/03/05/what-does-using-filesort-mean-in-mysql/
http://dev.housetrip.com/2013/04/19/mysql-order-by-null/

Comment 5 Dirk Petersen 2018-06-26 17:49:36 MDT

Yes, That's about 1 year worth of jobs which we like to keep. Our job ids are now in the 60 millions. I tried removing the sort but that did not accelerate significantly.
I wonder if there was a better way to get cpu time / cpu hours per job. Couldn't this data be just added to a new column in the table ?  Avoiding a computed field would greatly accelerate the query?

Thanks
Dirk

Comment 6 Felip Moll 2018-06-27 04:52:36 MDT

(In reply to Dirk Petersen from comment #5)
> Yes, That's about 1 year worth of jobs which we like to keep. Our job ids
> are now in the 60 millions. I tried removing the sort but that did not
> accelerate significantly.

I tried the same in a 3 million entries database. I removed the calculation and the order by and it haven't improved much.

I think the time consuming operation is the group by. How much does it take for you to just select the fields in job table?

SELECT gizmo_job_table.account, gizmo_job_table.id_user, gizmo_job_table.time_end, gizmo_job_table.time_start, gizmo_job_table.cpus_req
FROM gizmo_job_table
WHERE gizmo_job_table.time_end > 0 AND gizmo_job_table.time_start > 0 AND gizmo_job_table.time_end > Unix_Timestamp('2010-07-01 00:00:00') 
GROUP BY gizmo_job_table.account, gizmo_job_table.id_user;

for me it takes exactly the same than when doing the calculations.


> I wonder if there was a better way to get cpu time / cpu hours per job.

Well, the usage tables have this information, but as you said there's no partition information in there nor detailed job info.

When you archive a job, you still have this same information but in file txt.

> Couldn't this data be just added to a new column in the table ?  Avoiding a
> computed field would greatly accelerate the query?

In many cases people just archive the jobs. Dealing with >20 millions of rows in the Slurm database will probably lead to problems in a near future.

I think that you need to use the Job Completion plugin and store the information in another location to better make analysis about that.
A common solution is to use the jobcomp/elasticsearch plugin and store this detailed information in an elastic search server. Then with a
Kibana you can do graphics and analyze this information, while you unload the jobs from the slurm database.

Job completion plugins can store much more information than what's saved in general usage tables.

Would this be a solution for you?

Comment 7 Felip Moll 2018-07-04 07:58:06 MDT

Hi Dirk,

Did my comment 6 help you with finding a solution to the issue?

Comment 8 Felip Moll 2018-07-11 07:59:14 MDT

Hi Dirk, I am closing this issue as info given.

If you have more issues or still don't find a reliable solution just mark the bug as open again or raise a new one.

Best regards,
Felip M

Comment 9 Dirk Petersen 2018-07-11 08:51:11 MDT

Thanks Felip, that sounds reasonable, I have to see which solution is the best for me