2142 – Make swap a selectable resource

Ticket 2142 - Make swap a selectable resource

Summary: Make swap a selectable resource

Status:	RESOLVED INFOGIVEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Limits (show other tickets)
Version:	15.08.2
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	Moe Jette
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2015-11-14 16:08 MST by John Hanks
Modified:	2016-08-03 11:06 MDT (History)
CC List:	0 users

See Also:
Site:	KAUST
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description John Hanks 2015-11-14 16:08:19 MST

Hello,

We have a subset of applications, primarily genome assembly but also other assorted annotation or alignment tools, which require large amounts of memory. But we have relatively few large memory machines. To accommodate these codes, we've enabled zram as swap on our nodes. For many of these codes, this works great. We trade a little CPU, which we weren't able to fully utilize anyway, for fast swapping and in some cases see the zram compressing up to 10x or higher. We get to run relatively large memory jobs on relatively small memory nodes. Yay for us.

Except, some stuff doesn't compress well in zram and immediately spills over onto disk. Before we started using zram, we were using cgroups to enforce swap limits and limiting apps to only getting swap equal to 10% of the requested RAM. This caused things eating a lot of swap to fall over quickly. But now that we have had to increase that to accommodate the effective usage of zram, these non-swap-friendly apps send the nodes into swap-thrash-death. 

Is it possible, or could it be made possible, to have a parameter like --swap/--swap-per-cpu so that jobs can select the amount of swap they want to attempt to use? This would allow us to set a low default which would prevent swap-thrash-death and allow jobs that can effectively use zram/swap to set a much higher amount. 

Thanks,

jbh

Comment 1 David Bigagli 2015-11-15 18:34:34 MST

Hi,
   as you know Slurm currently does not have this feature. Development will
evaluate this and get back to you.

David

Comment 2 Moe Jette 2016-08-02 14:31:02 MDT

Perhaps GRES (Generic RESources) could be used for this purpose. You can define a GRES count per node and jobs requesting it would consume those resources. Node configurations would look something like this:

NodeName=nid[00000-01000] Gres=swap:1g ....

gres.conf would include:
Name=swap Count=1g

Job requests would look something like this:

sbatch --gres=swap:100m ...

A job submit plugin could set default swap values if desired.

More information about GRES is available here:
http://slurm.schedmd.com/gres.html

Let me know if this addresses your needs.

Comment 3 John Hanks 2016-08-03 11:00:11 MDT

What we wound up doing was similar to your suggestion, except we applied it to zram. On the nodes where we allow this we added a "zram" feature, then wrote a submit plugin which checks for the feature and if found sets --mem=0 and --exclusive. Prolog and epilog scripts then enable zram for the job and disable it once the job is complete. Now we can simply lower the available disk based swap to some amount that is general purpose and people running large jobs can activate zram as-needed. 

My original idea was that allowing selectable swap amounts would allow jobs to run on the same node with different swap limits but upon further pondering I realized that almost all jobs either want all the swap they can get or no swap at all. Will still let all jobs that set a memory amount go over it by 10% into swap and that seems to be a fairly good boundary.

I think you can close this request as yeah it would be neat but it's really unnecessary. If it turns out we do want to allow selectable swap in the future I will probably follow the same approach and have a prolog/epilog add and remove a swap zvol for the duration of the job. 

Thank you,

jbh

Comment 4 Moe Jette 2016-08-03 11:06:39 MDT

Resolved using GRES.