Ticket 4153

Summary:	sacct -o ReqMem output wrong when running slurm-17-02 and outputting data collected with slurm-16-05
Product:	Slurm	Reporter:	Josh Samuelson <josh>
Component:	Accounting	Assignee:	Danny Auble <da>
Status:	RESOLVED FIXED	QA Contact:
Severity:	4 - Minor Issue
Priority:	---
Version:	17.02.6
Hardware:	Linux
OS:	Linux
Site:	University of Nebraska–Lincoln	Slinky Site:	---
Alineos Sites:	---	Atos/Eviden Sites:	---
Confidential Site:	---	Coreweave sites:	---
Cray Sites:	---	DS9 clusters:	---
Google sites:	---	HPCnow Sites:	---
HPE Sites:	---	IBM Sites:	---
NOAA SIte:	---	NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---	OCF Sites:	---
Recursion Pharma Sites:	---	SFW Sites:	---
SNIC sites:	---	Tzag Elita Sites:	---
Linux Distro:	---	Machine Name:
CLE Version:		Version Fixed:	17.02.8 17.11.0-pre3
Target Release:	---	DevPrio:	---
Emory-Cloud Sites:	---

Description Josh Samuelson 2017-09-12 15:36:43 MDT

Greetings,

Once or twice a year, we'll query the Slurm accounting database to collect various stats on how are cluster is used for administrative reporting purposes. It was brought to my attention from one of our staff that many jobs historic memory requests were off the chart, wrong. After we looked at the date sorted records for a while, we realized that it seemed to correct itself the day we updated from slurm-16-05 to slurm-17-02.

I did some digging and I think the once good historic data that is now reporting bad is related to this work:

https://github.com/SchedMD/slurm/commit/acc75cd1897269dac648c94b0d633aac26a164b4

In prior versions of Slurm, if a job used --mem-per-cpu, Slurm would add to that value MEM_PER_CPU, aka 0x80000000. If the 2^31 bit is set in the job's memory record, it would know it's a mem-per-cpu request and not a mem per node request. After the update, the 2^63 bit isn't set (new value of MEM_PER_CPU), so it assumes the 2^31st bit is a legit node memory request.

It's possible I missed something from the RELEASE_NOTES when I did the upgrade to handle the accounting database. I guess I kinda assumed the slurmdbd would handle any schema and data transformations if needed.

I searched to see if anyone else has ran into this but wasn't successful in matches, so perhaps it was something I did.

So my questions:

1) Obvious one, did I miss an update step?

2) In a test VM with a copy of our database, I ran the following which seemed to correct the data. Does this appear safe to run on our production database?

update clustername_job_table set mem_req = 0x8000000000000000 | (mem_req ^ 0x80000000) where (mem_req & 0xffffffff80000000) = 0x80000000;

Thanks much,
-Josh

Comment 1 Tim Wickberg 2017-09-12 15:40:37 MDT

(In reply to Josh Samuelson from comment #0)
> Greetings,
> 
> Once or twice a year, we'll query the Slurm accounting database to collect
> various stats on how are cluster is used for administrative reporting
> purposes.  It was brought to my attention from one of our staff that many
> jobs historic memory requests were off the chart, wrong.  After we looked at
> the date sorted records for a while, we realized that it seemed to correct
> itself the day we updated from slurm-16-05 to slurm-17-02.
> 
> I did some digging and I think the once good historic data that is now
> reporting bad is related to this work:
> 
> https://github.com/SchedMD/slurm/commit/
> acc75cd1897269dac648c94b0d633aac26a164b4
> 
> In prior versions of Slurm, if a job used --mem-per-cpu, Slurm would add to
> that value MEM_PER_CPU, aka 0x80000000.  If the 2^31 bit is set in the job's
> memory record, it would know it's a mem-per-cpu request and not a mem per
> node request.  After the update, the 2^63 bit isn't set (new value of
> MEM_PER_CPU), so it assumes the 2^31st bit is a legit node memory request.
> 
> It's possible I missed something from the RELEASE_NOTES when I did the
> upgrade to handle the accounting database.  I guess I kinda assumed the
> slurmdbd would handle any schema and data transformations if needed.
> 
> I searched to see if anyone else has ran into this but wasn't successful in
> matches, so perhaps it was something I did.
> 
> So my questions:
> 
> 1) Obvious one, did I miss an update step?

No - looks like I did when doing the conversion work. I obviously missed the flag being used in the accounting database, and should have worked out a conversion process to automatically run on the upgrade.

> 2) In a test VM with a copy of our database, I ran the following which
> seemed to correct the data.  Does this appear safe to run on our production
> database?
> 
> update clustername_job_table set mem_req = 0x8000000000000000 | (mem_req ^
> 0x80000000) where (mem_req & 0xffffffff80000000) = 0x80000000;

Assuming you have no systems with > 2TB of RAM, that should be fine.

Comment 2 Danny Auble 2017-09-27 15:47:37 MDT

This is cosmetically fixed in 17.02 with commit 7bf6ade891f3.  It was completely fixed in 17.11 in commit 989a92827bc17.

Thanks for the SQL, we used it when fixing this correctly.

Please reopen if you notice anything I missed.