Ticket 11968

Summary: cannot run parallel job steps - srun ignores -c when submitting a job step
Product: Slurm Reporter: Enrico Tagliavini <enrico.tagliavini>
Component: RegressionAssignee: Jacob Jenson <jacob>
Status: RESOLVED INVALID QA Contact:
Severity: 6 - No support contract    
Priority: ---    
Version: 20.11.8   
Hardware: Linux   
OS: Linux   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: CentOS Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm.conf

Description Enrico Tagliavini 2021-07-05 06:34:57 MDT
After updating to SLRUM 20.11.6 from 19.05.5 it seems every SLURM step gets allocated the entire RAM memory, even if the --mem option was specified, making it impossible to run multiple steps in parallel.

For example creating an allocation like

salloc --mem 22G --time 01:00:00 -N 1 -n 1 -c 20 -p cpu_short bash

The first step can be started with

srun --mem 1G -n 1 -c 1 -- bash -c 'echo starting 1 ; sleep 180' &

but the second step would hand

srun --mem 1G -n 1 -c 1 -- bash -c 'echo starting 2 ; sleep 180' &

According to sacct step .0 was allocated all 22 GB of memory, despite the --mem 1G option was specified.

Thank you for your help.
Kind regards.

Enrico Tagliavini
Comment 1 Enrico Tagliavini 2021-07-05 06:41:25 MDT
Actually it looks like also the -c 1 option seems to be ignored and a single step is billed for all 20.
Comment 2 Enrico Tagliavini 2021-07-05 08:11:53 MDT
updated to 20.11.8 , problem persist
Comment 3 Enrico Tagliavini 2021-07-05 08:16:43 MDT
Adding --overlap to the srun command seems to help, but it should not be required.

For the steps the ReqMem field from sacct still says 22Gn, but the billing confirms the memory is 1G as specified by srun. However the billing for the cpus is wrong, still set to 20, despite the -n 1 -c 1

Billing of salloc: billing=20,cpu=20,mem=22G,node=1 
                                                                                billing of step .0: cpu=20,mem=1G,node=1                                                             billing of step .1: cpu=20,mem=1G,node=1
Comment 4 Enrico Tagliavini 2021-07-06 02:19:23 MDT
Created attachment 20245 [details]
slurm.conf

attached slurm.conf . We don't use a cli filter plugin, so this cannot be the source of the issue.
Comment 5 Enrico Tagliavini 2021-07-06 04:40:40 MDT
I added DebugFlags=Steps to slurm.conf and found the following

[2021-07-06T11:58:41.225] STEPS: _pick_step_nodes: JobId=31564 Currently running steps use 0 of allocated 20 CPUs on node compute01
[2021-07-06T11:58:41.225] _pick_step_nodes: step pick 1-1 nodes, avail:compute01 idle:compute01 picked:NONE
[2021-07-06T11:58:41.225] STEPS: _pick_step_nodes: step picked 0 of 1 nodes
[2021-07-06T11:58:41.225] STEPS: Picked nodes compute01 when accumulating from compute01
[2021-07-06T11:58:41.225] STEPS: step alloc on job node 0 (compute01) used 20 of 20 CPUs
[2021-07-06T11:58:41.231] STEPS: _slurm_rpc_job_step_create: JobId=31564 StepId=2 compute01 usec=6982

Which confirms all 20 CPUs are allocated to the step, despite -c and -n being specified and equal to 1.


Adding --exact to the srun commands seems to make it work as intended

[2021-07-06T12:35:26.004] STEPS: _pick_step_nodes: JobId=31564 Currently running steps use 0 of allocated 20 CPUs on node compute01
[2021-07-06T12:35:26.004] _pick_step_nodes: step pick 1-1 nodes, avail:compute01 idle:compute01 picked:NONE
[2021-07-06T12:35:26.004] STEPS: _pick_step_nodes: step picked 0 of 1 nodes
[2021-07-06T12:35:26.004] STEPS: Picked nodes compute01 when accumulating from compute01
[2021-07-06T12:35:26.004] STEPS: step alloc on job node 0 (compute01) used 1 of 20 CPUs
[2021-07-06T12:35:26.010] STEPS: _slurm_rpc_job_step_create: JobId=31564 StepId=6 compute01 usec=6119


However may I ask why -c and -n do not imply --exact? This is very confusing because whatever the user is asking for is ignored and replace with the whole allocation. When srun is called with -c and -n the calling user might be asking for the --exact behavior.