Ticket 12909

Summary: --cpus-per-task does not imply --exact even though it is supposed to
Product: Slurm Reporter: Marshall Garey <marshall>
Component: User CommandsAssignee: Marshall Garey <marshall>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: lyeager
Version: 21.08.0   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=13041
https://bugs.schedmd.com/show_bug.cgi?id=13351
https://bugs.schedmd.com/show_bug.cgi?id=11275
https://bugs.schedmd.com/show_bug.cgi?id=10197
Site: SchedMD Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: 21.08.5 22.05.0pre1 Version Fixed: 21.08.5 22.05.0pre1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Marshall Garey 2021-11-24 10:27:27 MST
* --cpus-per-task is supposed to imply --exact, but I just tested this and it is broken. Example:

$ salloc -n8
salloc: Granted job allocation 279
salloc: Waiting for resource configuration
salloc: Nodes n1-1 are ready for job
$ srun -c1 -n1 whereami
0000 n1-1 - Cpus_allowed:       00000f0f        Cpus_allowed_list:      0-3,8-11
$ srun --exact -c1 -n1 whereami
0000 n1-1 - Cpus_allowed:       00000101        Cpus_allowed_list:      0,8

It was broken during the refactor in commit 5154ed21e2c.
That commit makes everything set in step_req, but did not make the change for srun -c.

This fixes it, but duplicates setting the flags in step_req.

We could move blocks of code instead:
* Set the flags *after* looking at overcommit and cpus_per_task.
* Look at the overcommit and cpus_per_task options before setting the flags.

$ git diff
diff --git a/src/srun/libsrun/launch.c b/src/srun/libsrun/launch.c
index c4051a669e..66ea4fca1e 100644
--- a/src/srun/libsrun/launch.c
+++ b/src/srun/libsrun/launch.c
@@ -225,6 +225,7 @@ static job_step_create_request_msg_t *_create_job_step_create_request(
               if (!srun_opt->exact)
                       verbose("Implicitly setting --exact, because -c/--cpus-per-task given.");
               srun_opt->exact = true;
+               step_req->flags &= ~SSF_WHOLE;
       } else if (opt_local->gpus_per_task && opt_local->cpus_per_gpu) {
               char *save_ptr = NULL, *tmp_str, *tok, *sep;
               int gpus_per_task = 0;
Comment 5 Marshall Garey 2021-12-15 10:07:24 MST
Fixed ahead of 21.08.5.

6e13352fc2 (HEAD -> slurm-21.08, origin/slurm-21.08, bug12909) Fix srun -c and --threads-per-core imply --exact

I also opened bug 13041 to track adding regression tests for --exact, and that --threads-per-core and --cpus-per-task imply --exact.

Closing