Ticket 13595 - Changes to "srun --overlap" coming in Slurm 22.05, new "--overlap=force" option
Summary: Changes to "srun --overlap" coming in Slurm 22.05, new "--overlap=force" option
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 22.05.x
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-03-09 17:41 MST by Tim Wickberg
Modified: 2022-04-28 11:36 MDT (History)
3 users (show)

See Also:
Site: SchedMD
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Tim Wickberg 2022-03-09 17:41:47 MST
Hey Kevin -

Apologies if you're not the best contact point, but this is an extension of an older discussion in bug 12880 that you were involved in.

For the upcoming Slurm 22.05 release in May, we're unfortunately needing to adjust the job step launch semantics again, in response to some inconsistencies with the current --overlap behavior.

Going forward, a new --overlap=force option should be used to launch job steps that are explicitly meant to be ignored by Slurm's job step scheduling mechanisms. This means those steps will not be assigned any dedicated resources, and are effectively ignored. From what I can tell, the DDT processes best fit this model, and would need to have the launch scripts be adjusted to suit this.

Without this, unless the job step that is meant to be debugged is also launched with --overlap, a successive --overlap launch _will not_ be permitted to share resources.

If there's something we can do to ease this transition please let me know.
- Tim
Comment 1 Kevin Mooney 2022-03-10 03:59:15 MST
Hi Tim,

Thank you for the forewarning, it's much appreciated.

We should be able to support this in Arm Forge (DDT & MAP) before May. Will there be a preview that we could test against?

Kevin
Comment 2 Marshall Garey 2022-03-10 06:59:29 MST
Hi Kevin,

The changes are already upstream on github in the Slurm master branch:

https://github.com/SchedMD/slurm/

These are the relevant commits:

fe9f416ec2 Add --overlap=force option to srun
751b1b4288 Steps may only overlap with steps that also used --overlap
84d602dd7f Pack/unpack cpus_overlap
cfbd78601b Add a way to track overlapped cpus in a job (--overlap)


Please let us know if you have any problems testing this change.

I'd like elaborate on the motivation behind the change to --overlap:


The behavior change to --overlap: Steps that specify --overlap cannot overlap with steps that do not specify --overlap.

In Slurm 21.08 and 20.11 (--overlap did not exist prior to 20.11), the following two srun steps will run in parallel:

> $ sbatch -N1 -c2 --mem 1G --wrap "srun sleep 300"
> Submitted batch job 72
> $ srun   -N1 -c2 --mem 0 --overlap --pty --jobid=72 /bin/bash


However, the following two steps will *not* run in parallel:

> $ sbatch -N1 -c2 --mem 1G --wrap "srun --overlap sleep 300"
> Submitted batch job 72
> $ srun   -N1 -c2 --mem 0 --pty --jobid=72 /bin/bash # Not started in parallel


Why does it work this way in 21.08? The steps that don't have overlap (therefore, exclusive access to resources) won't use CPUs that are already being used. They don't know if the CPUs are being used by steps with --overlap or not.

This is confusing and inconsistent. Therefore, we decided to change --overlap such that they may only overlap other steps that also specify --overlap.

This change breaks some users' use of --overlap, and breaks some current debugging tools such as Arm Forge (DDT and Map) that rely on --overlap to create a "debugging" step. Hence this bug to communicate this change.

A final piece of motivation for adding --overlap=force is that --overlap only causes CPUs to be shared, but not other resources (memory, GRES). So, the 20.11/21.08 behavior of --overlap didn't really work as a "debugging" or "zero-allocation" step.
Comment 3 Marshall Garey 2022-03-11 14:12:39 MST
Kevin,

We made a proposal to the site sponsoring this change to swap the "--overlap" and "--overlap=force" behaviors, like so:


* --overlap=force becomes --overlap. Therefore, using --overlap will get this new overlap behavior of overlapping on all resources (by not being counted against the job's allocation).
* --overlap becomes --overlap=mutual. This would be here to opt into the 20.11/21.08 behavior but with the fixes to ensure that these steps only overlap with other steps that specify --overlap=mutual.


They agreed that this would make more sense to them. We also really don't want to break things. So, we're going to make this change. This means that the Arm Forge tools shouldn't be broken by 22.05 anymore; and in fact they should be better with the new --overlap behavior.


Do you have any questions about this?
Comment 4 Kevin Mooney 2022-03-14 05:19:47 MDT
Hi Marshall,

That's great to hear that the interface we use won't change.

We currently use --mem-per-cpu=0 with --overlap to launch our debug step. Will this be no longer necessary with --overlap's new behaviour?

Kevin
Comment 5 Marshall Garey 2022-03-14 10:03:06 MDT
(In reply to Kevin Mooney from comment #4)
> Hi Marshall,
> 
> That's great to hear that the interface we use won't change.
> 
> We currently use --mem-per-cpu=0 with --overlap to launch our debug step.
> Will this be no longer necessary with --overlap's new behaviour?

It will still be necessary. The new --overlap behavior just means that whatever resources are allocated to this step can also be allocated to any other step. However, the step is still allocated exactly what it asks for. So by requested --mem-per-cpu=0 (or --mem=0 which is equivalent), the step requests all of the memory in the job allocation.

So, steps that request --overlap still need to request whatever resources (CPUS, memory, nodes, GRES) they need. The new --overlap behavior means that they won't block other steps from running on those resources (previously it was just CPUs and didn't even work properly).
Comment 6 Marshall Garey 2022-03-30 09:59:14 MDT
Quick update:

Implementing --overlap=mutual properly turned out to be more difficult and had bad performance, so we're throwing that away. We are still keeping the new behavior for --overlap (where it overlaps all resources, not just CPUs).

Closing this as infogiven.