Ticket 6946 - Percentage of memory used during a job
Summary: Percentage of memory used during a job
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 17.11.8
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Ben Roberts
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-05-01 14:07 MDT by Steve Shortino
Modified: 2019-07-08 11:42 MDT (History)
0 users

See Also:
Site: FRB Kansas
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Steve Shortino 2019-05-01 14:07:12 MDT
This might be an odd request, and I suspect it is something I need to code up myself anyhow.  My site has a need to educate users based on their RAM usage (Or other resource usage) for their jobs, so I was wondering if there was a quick way to get the percentage of ram requested that was actually used.

I'm currently parsing out sacct output and using maxVMsize/reqMem to get a percentage number to pass to my users, and was curious if there was already functionality like this that I could leverage instead.

I'm also interested in knowing if I am giving them an accurate number with that division above!

Anyway, please let me know if such a way exists!

Best regards,

Steve
Comment 1 Ben Roberts 2019-05-02 10:37:08 MDT
Hi Steve,

The way you're doing the calculation, with maxVMsize/reqMem, is the correct way to show users their used vs requested RAM.  One thing to note is that if you're using jobacct_gather/linux the usage information collected for a job is pretty good, but can be a little off if the job gets interrupted.  If you use jobacct_gather/cgroup the collected data is much more accurate in cases where something goes wrong.  

I looked into whether there is an existing tool that shows a percentage like that, but there's not.  The correct way to get that information would be to pull the data from sacct and calculate it, like you're doing.  

Let me know if you have additional questions about this.

Thanks,
Ben
Comment 2 Steve Shortino 2019-05-09 12:19:26 MDT
Hello Ben,

Thank you for the info!  I actually ran into an issue calculating this out that has revealed some problems, maybe with how I am reading the memory usage.  Can you help me figure out why I am seeing this?


JobID|ReqMem|MaxVMSize|ReqCPUS|State|JobName
34233|5Gn||14|FAILED|Model1_Random_Forest
34233.batch|5Gn|189816K|14|FAILED|batch
34233.0|5Gn|2048908K|14|CANCELLED by 3577|Rscript
34234|18Gn||14|FAILED|Model1_Random_Forest
34234.batch|18Gn|189816K|14|FAILED|batch
34234.0|18Gn|2067148K|14|CANCELLED by 3577|Rscript
34235|21Gn||14|FAILED|Model1_Random_Forest
34235.batch|21Gn|189816K|14|FAILED|batch
34235.0|21Gn|2073420K|14|CANCELLED by 3577|Rscript
34236|23Gn||14|FAILED|Model1_Random_Forest
34236.batch|23Gn|189816K|14|FAILED|batch
34236.0|23Gn|2073304K|14|CANCELLED by 3577|Rscript
34237|25Gn||14|COMPLETED|Model1_Random_Forest
34237.batch|25Gn|189816K|14|COMPLETED|batch
34237.0|25Gn|2079604K|14|COMPLETED|Rscript

Note that while the job only completed at 25G, it only reports a maxVMsize of 2G?  I know that SLURM is seeing the rest of the memory usage, since my output logs have these errors in them:
Model1_Random_Forest.2019-05-09T125236.34235.qlog:slurmstepd: error: Step 34235.0 exceeded memory limit (22640340 > 22020096), being killed
Model1_Random_Forest.2019-05-09T125708.34236.qlog:slurmstepd: error: Step 34236.0 exceeded memory limit (25315208 > 24117248), being killed


Please let me know what you think, and if there is anything else I can provide to help clear up this process!  Right now the percentage I am calculating does not appear at all accurate.

Best regards,

Steve
Comment 3 Ben Roberts 2019-05-10 11:21:51 MDT
Hi Steve,

I've got to amend my previous statement that doing MaxVMSize/ReqMem was the proper way to calculate the percentage of the memory used.  The MaxVMSize is the amount of swap the job used.  You can find the amount of RAM the job used by looking at MaxRSS.  

If you're using cgroups you can control the percentage of RAM vs Swap that jobs are allowed to use with AllowedRAMSpace and AllowedSwapSpace in the cgroup.conf file.  There's more information about these parameters, and others in the cgroup.conf documentation:
https://slurm.schedmd.com/cgroup.conf.html

With that in mind you can decide whether you want to include the swap in the calculation of the amount of memory used vs requested.  You could do:
MaxRSS/ReqMem

or
(MaxRSS + MaxVMSize)/ReqMem

My apologies that I didn't catch that the first time, I was more concerned about whether there was a tool that already did that calculation.  Let me know if you have any additional questions.

Thanks,
Ben
Comment 4 Steve Shortino 2019-05-13 11:27:11 MDT
NONCONFIDENTIAL // EXTERNAL
Hello Ben,

I was out of the office on Friday.  Thank you for the advice, using MaxRSS is producing numbers that look better.  I am using cgroups for control and resource tracking.  However, this does bring up another question (Which may or may not be relevant): My compute nodes are all diskless with no swap configured, so what is MaxVMSize capturing?  I will do some more testing, but right now MaxRSS / ReqMem is providing the numbers I expect from the jobs I’ve run so far.

Thanks and regards,

Steve

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Friday, May 10, 2019 12:22 PM
To: Shortino, Steven M <Steve.Shortino@kc.frb.org>
Subject: [External] [Bug 6946] Percentage of memory used during a job

NONCONFIDENTIAL // EXTERNAL

PLEASE NOTE: This email is not from a Federal Reserve address.
Do not click on suspicious links. Do not give out personal or bank information to unknown senders.


Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=6946#c3> on bug 6946<https://bugs.schedmd.com/show_bug.cgi?id=6946> from Ben<mailto:ben@schedmd.com>

Hi Steve,



I've got to amend my previous statement that doing MaxVMSize/ReqMem was the

proper way to calculate the percentage of the memory used.  The MaxVMSize is

the amount of swap the job used.  You can find the amount of RAM the job used

by looking at MaxRSS.



If you're using cgroups you can control the percentage of RAM vs Swap that jobs

are allowed to use with AllowedRAMSpace and AllowedSwapSpace in the cgroup.conf

file.  There's more information about these parameters, and others in the

cgroup.conf documentation:

https://slurm.schedmd.com/cgroup.conf.html



With that in mind you can decide whether you want to include the swap in the

calculation of the amount of memory used vs requested.  You could do:

MaxRSS/ReqMem



or

(MaxRSS + MaxVMSize)/ReqMem



My apologies that I didn't catch that the first time, I was more concerned

about whether there was a tool that already did that calculation.  Let me know

if you have any additional questions.



Thanks,

Ben

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 6 Ben Roberts 2019-05-14 16:15:15 MDT
Hi Steve,

I've been looking into where the MaxVMSize comes from.  This comes back to the way linux calculates the Virtual Size (VSZ), which isn't just the amount of swap space used.  As an example on my system you can see that I have 0 swap in use currently:

$ swapon -s
Filename                                Type            Size    Used    Priority
/swapfile                               file            2097148 0       -2


But if I look at 'ps' for the processes with the most vsz used there are some processes with quite a bit:

$ ps aux --sort -vsz | head -n5
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
ben       5863  0.0  0.2 268771560 41716 ?     Sl   09:14   0:03 /usr/lib/x86_64-linux-gnu/libexec/baloorunner
ben       2450  0.0  0.4 268706772 73644 ?     SNl  08:59   0:08 /usr/bin/baloo_file
ben       2454  0.3  3.8 6909856 619080 ?      SLl  08:59   1:55 /usr/bin/plasmashell
ben       2448  5.9  1.5 3170260 254136 ?      Sl   08:59  29:01 /usr/bin/kwin_x11 -session 1012012111e93000154688027200000015320004_1557786765_781859


There are some good descriptions I found of what's included in the vsz reported:
----------------------------------
"VSZ is the Virtual Memory Size. It includes all memory that the process can access, including memory that is swapped out, memory that is allocated, but not used, and memory that is from shared libraries."

https://stackoverflow.com/questions/7880784/what-is-rss-and-vsz-in-linux-memory-management

"VSZ is virtual memory which a process can use while RSS is physical memory actually allocated at the moment."

https://stackoverflow.com/questions/31867856/vsz-vs-rss-memory-and-swap-space

----------------------------------

I assume that if you look at one of your diskless nodes you'll see similar behavior where there is usage reported in the vsz column of 'ps'.  Let me know if that's not the case or if you have additional questions about this.

Thanks,
Ben
Comment 7 Ben Roberts 2019-06-05 15:36:32 MDT
Hi Steve,

I wanted to follow up and make sure the information I sent about the Virtual Size made sense and that you were able to get the information you needed from the report.  Let me know if you have any additional questions about this.

Thanks,
Ben
Comment 8 Ben Roberts 2019-07-08 11:42:22 MDT
Hi Steve,

The information I sent should have helped clarify what the reported Virtual Size field meant and I haven't heard a follow up question.  I'll close this ticket as 'InfoGiven' but feel free to update the ticket if you do have additional questions about this.

Thanks,
Ben