Ticket 12909 - --cpus-per-task does not imply --exact even though it is supposed to
Summary: --cpus-per-task does not imply --exact even though it is supposed to
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: User Commands (show other tickets)
Version: 21.08.0
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Marshall Garey
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-11-24 10:27 MST by Marshall Garey
Modified: 2022-03-30 06:24 MDT (History)
1 user (show)

See Also:
Site: SchedMD
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version: 21.08.5 22.05.0pre1
Version Fixed: 21.08.5 22.05.0pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Marshall Garey 2021-11-24 10:27:27 MST
* --cpus-per-task is supposed to imply --exact, but I just tested this and it is broken. Example:

$ salloc -n8
salloc: Granted job allocation 279
salloc: Waiting for resource configuration
salloc: Nodes n1-1 are ready for job
$ srun -c1 -n1 whereami
0000 n1-1 - Cpus_allowed:       00000f0f        Cpus_allowed_list:      0-3,8-11
$ srun --exact -c1 -n1 whereami
0000 n1-1 - Cpus_allowed:       00000101        Cpus_allowed_list:      0,8

It was broken during the refactor in commit 5154ed21e2c.
That commit makes everything set in step_req, but did not make the change for srun -c.

This fixes it, but duplicates setting the flags in step_req.

We could move blocks of code instead:
* Set the flags *after* looking at overcommit and cpus_per_task.
* Look at the overcommit and cpus_per_task options before setting the flags.

$ git diff
diff --git a/src/srun/libsrun/launch.c b/src/srun/libsrun/launch.c
index c4051a669e..66ea4fca1e 100644
--- a/src/srun/libsrun/launch.c
+++ b/src/srun/libsrun/launch.c
@@ -225,6 +225,7 @@ static job_step_create_request_msg_t *_create_job_step_create_request(
               if (!srun_opt->exact)
                       verbose("Implicitly setting --exact, because -c/--cpus-per-task given.");
               srun_opt->exact = true;
+               step_req->flags &= ~SSF_WHOLE;
       } else if (opt_local->gpus_per_task && opt_local->cpus_per_gpu) {
               char *save_ptr = NULL, *tmp_str, *tok, *sep;
               int gpus_per_task = 0;
Comment 5 Marshall Garey 2021-12-15 10:07:24 MST
Fixed ahead of 21.08.5.

6e13352fc2 (HEAD -> slurm-21.08, origin/slurm-21.08, bug12909) Fix srun -c and --threads-per-core imply --exact

I also opened bug 13041 to track adding regression tests for --exact, and that --threads-per-core and --cpus-per-task imply --exact.

Closing