Hello, We have a subset of applications, primarily genome assembly but also other assorted annotation or alignment tools, which require large amounts of memory. But we have relatively few large memory machines. To accommodate these codes, we've enabled zram as swap on our nodes. For many of these codes, this works great. We trade a little CPU, which we weren't able to fully utilize anyway, for fast swapping and in some cases see the zram compressing up to 10x or higher. We get to run relatively large memory jobs on relatively small memory nodes. Yay for us. Except, some stuff doesn't compress well in zram and immediately spills over onto disk. Before we started using zram, we were using cgroups to enforce swap limits and limiting apps to only getting swap equal to 10% of the requested RAM. This caused things eating a lot of swap to fall over quickly. But now that we have had to increase that to accommodate the effective usage of zram, these non-swap-friendly apps send the nodes into swap-thrash-death. Is it possible, or could it be made possible, to have a parameter like --swap/--swap-per-cpu so that jobs can select the amount of swap they want to attempt to use? This would allow us to set a low default which would prevent swap-thrash-death and allow jobs that can effectively use zram/swap to set a much higher amount. Thanks, jbh
Hi, as you know Slurm currently does not have this feature. Development will evaluate this and get back to you. David
Perhaps GRES (Generic RESources) could be used for this purpose. You can define a GRES count per node and jobs requesting it would consume those resources. Node configurations would look something like this: NodeName=nid[00000-01000] Gres=swap:1g .... gres.conf would include: Name=swap Count=1g Job requests would look something like this: sbatch --gres=swap:100m ... A job submit plugin could set default swap values if desired. More information about GRES is available here: http://slurm.schedmd.com/gres.html Let me know if this addresses your needs.
What we wound up doing was similar to your suggestion, except we applied it to zram. On the nodes where we allow this we added a "zram" feature, then wrote a submit plugin which checks for the feature and if found sets --mem=0 and --exclusive. Prolog and epilog scripts then enable zram for the job and disable it once the job is complete. Now we can simply lower the available disk based swap to some amount that is general purpose and people running large jobs can activate zram as-needed. My original idea was that allowing selectable swap amounts would allow jobs to run on the same node with different swap limits but upon further pondering I realized that almost all jobs either want all the swap they can get or no swap at all. Will still let all jobs that set a memory amount go over it by 10% into swap and that seems to be a fairly good boundary. I think you can close this request as yeah it would be neat but it's really unnecessary. If it turns out we do want to allow selectable swap in the future I will probably follow the same approach and have a prolog/epilog add and remove a swap zvol for the duration of the job. Thank you, jbh
Resolved using GRES.