Ticket 1095

Summary: Disable swap usage with cgroups
Product: Slurm Reporter: Kilian Cavalotti <kilian>
Component: LimitsAssignee: David Bigagli <david>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: da
Version: 14.03.7   
Hardware: Linux   
OS: Linux   
Site: Stanford Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Kilian Cavalotti 2014-09-09 11:14:38 MDT
Hi,

I'm trying to setup cgroups support to disallow users jobs to use swap at all. It mostly works, but not completely.

I have the following /etc/slurm/cgroup.conf:
# general
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
# task/cgroup plugin
TaskAffinity=yes # require hwloc
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
# prevent jobs from using swap space
AllowedRAMSpace=100  # in %
AllowedSwapSpace=0   # in %

and in slurm.conf:
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup


The memory limitation part seems to work pretty well overall, but I can still find jobs that make use of swap space:

# pwd 
/cgroup/memory/slurm/uid_15248/job_322107/step_4294967294
# grep 322107 /var/log/slurm/slurmd.log 
[2014-09-09T12:44:01.613] Launching batch job 322107 for UID 15248
[2014-09-09T12:44:01.630] [322107] checkpoint/blcr init
[2014-09-09T12:44:01.663] [322107] task/cgroup: /slurm/uid_15248/job_322107: alloc=20000MB mem.limit=20000MB memsw.limit=20000MB
[2014-09-09T12:44:01.663] [322107] task/cgroup: /slurm/uid_15248/job_322107/step_4294967294: alloc=20000MB mem.limit=20000MB memsw.limit=20000MB

And then I have:
memory.limit_in_bytes            20971520000
memory.usage_in_bytes            20851781632
memory.max_usage_in_bytes        20853907456
memory.failcnt                   0

memory.memsw.limit_in_bytes      20971520000
memory.memsw.usage_in_bytes      20970475520
memory.memsw.max_usage_in_bytes  20971520000
memory.memsw.failcnt             103751

So it looks like the memsw limit has been hit a few times, and yet the process is still running.

# cat cgroup.procs
14812
14827
14831
14834
14838
# grep VmSwap /proc/14834/status
VmSwap:   119328 kB

This PID definitely uses some swap.

So, I was wondering if this is all normal or if there was a way to really prevent a user process to use any swap at all. There's this memory.swapiness control file in the cgroups, but I don't think it can be set from Slurm.

Thanks.
Comment 1 David Bigagli 2014-09-10 05:07:00 MDT
Hi Kilian,
          I am working on this and will update you later on.

David
Comment 2 David Bigagli 2014-09-10 07:50:17 MDT
Hi Kilian,
         Slurm sets the limits correctly. Since the memory.memsw.limit_in_bytes
indicates the combined memory and swap limit if you set AllowedSwapSpace=0
then the values should indeed be equal. From Slurm perspective the things
are all right. 

It is more difficult for me tell you why your kernel allowed some swap space use regardless... I am running a heavy memory benchmark with settings like your but I don't see swap being in use. I see the memory limit being hit few times

>cat memory.failcnt
3191527

but not the swap. 

I am running on:

>cat /etc/redhat-release 
CentOS release 6.5 (Final)
>uname -a
Linux prometeo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

David
Comment 3 Kilian Cavalotti 2014-09-10 08:25:32 MDT
Hi David, 

Thanks for looking in to it.

(In reply to David Bigagli from comment #2)
>          Slurm sets the limits correctly. Since the
> memory.memsw.limit_in_bytes
> indicates the combined memory and swap limit if you set AllowedSwapSpace=0
> then the values should indeed be equal. From Slurm perspective the things
> are all right. 

Yes, memory.limit_in_bytes and memory.memsw.limit_in_bytes are the same value, which is good.

> It is more difficult for me tell you why your kernel allowed some swap space
> use regardless... I am running a heavy memory benchmark with settings like
> your but I don't see swap being in use. I see the memory limit being hit few
> times
> 
> >cat memory.failcnt
> 3191527
> 
> but not the swap. 

I think what happens here is that some memory pages get swapped off to disk while the overall usage is still under the limit. Maybe from pressure from other jobs in different cgroups.
So we have memsw.usage < limit and memory.usage < limit, but memsw.usage > memory.usage.

I guess I'm looking for a way to ensure that memsw.usage stays equals to memory.usage all the time, but I'm not sure that's even possible.


> 
> I am running on:
> 
> >cat /etc/redhat-release 
> CentOS release 6.5 (Final)
> >uname -a
> Linux prometeo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014
> x86_64 x86_64 x86_64 GNU/Linux
> 
> David
Comment 4 David Bigagli 2014-09-10 08:27:15 MDT
What OS and kernel version do you have?

David
Comment 5 Kilian Cavalotti 2014-09-10 08:29:37 MDT
(In reply to David Bigagli from comment #4)
> What OS and kernel version do you have?

Oh sorry, forgot about that:

Red Hat Enterprise Linux Server release 6.5 (Santiago)
Linux 2.6.32-431.23.3.el6.x86_64 #1 SMP Wed Jul 16 06:12:23 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Comment 6 David Bigagli 2014-09-15 06:07:42 MDT
Thanks for the info. I used CentOS 6.5 which is equivalent.
I suggest we close this ticket as not a Slurm problem.

David
Comment 7 Kilian Cavalotti 2014-09-15 06:34:03 MDT
(In reply to David Bigagli from comment #6)
> Thanks for the info. I used CentOS 6.5 which is equivalent.
> I suggest we close this ticket as not a Slurm problem.

That sounds ok, it indeed looks more like a kernel/OS issue.

Thanks.