Ticket 10627 - Single step without options not allowed to run in batch step
Summary: Single step without options not allowed to run in batch step
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 20.02.6
Hardware: Linux Linux
: 3 - Medium Impact
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-01-14 01:09 MST by Marcin Stolarek
Modified: 2021-04-19 14:19 MDT (History)
3 users (show)

See Also:
Site: IDRIS
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 20.11.6, 21.08pre1
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Slurmctld log debug5 (18.01 KB, text/plain)
2021-01-19 02:42 MST, IDRIS System Team
Details
Job output (2.16 KB, text/plain)
2021-01-19 10:11 MST, IDRIS System Team
Details
Slurmd log (62.83 KB, text/plain)
2021-01-20 01:51 MST, IDRIS System Team
Details
Slurmctld log with debugflags SelectType/TraceJobs/Steps (45.37 KB, text/plain)
2021-01-22 03:39 MST, IDRIS System Team
Details
Slurmd log with debugflags SelectType/TraceJobs/Steps (62.00 KB, text/plain)
2021-01-22 03:40 MST, IDRIS System Team
Details
debug patch (1.15 KB, patch)
2021-01-25 17:49 MST, Nate Rini
Details | Diff
Slurmd log after mem patch (62.46 KB, text/plain)
2021-02-11 10:12 MST, IDRIS System Team
Details
Slurmctld log after mem patch (42.04 KB, text/plain)
2021-02-12 00:56 MST, IDRIS System Team
Details
test patch (900 bytes, patch)
2021-02-12 14:13 MST, Nate Rini
Details | Diff
Slurmctld log (37.00 KB, text/plain)
2021-02-17 01:37 MST, IDRIS System Team
Details
Slurmd log (61.03 KB, text/plain)
2021-02-17 01:37 MST, IDRIS System Team
Details
patch to add more logging (2.99 KB, patch)
2021-02-17 09:56 MST, Nate Rini
Details | Diff
Slurmctld log with more logging (35.79 KB, text/plain)
2021-02-18 07:36 MST, IDRIS System Team
Details
Slurmd log with more logging (60.47 KB, text/plain)
2021-02-18 07:37 MST, IDRIS System Team
Details
Patches applied to Slurm v20.02.6 (4.83 KB, patch)
2021-02-18 07:49 MST, IDRIS System Team
Details | Diff
Slurmctld and slurmd logs (120.00 KB, application/x-tar)
2021-02-23 09:59 MST, IDRIS System Team
Details
patch for 20.02.6 (IDRIS only) (2.90 KB, patch)
2021-02-23 16:14 MST, Nate Rini
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Marcin Stolarek 2021-01-14 01:09:45 MST
Created attachment 17469 [details]
slurm.conf

Splitting this from Bug 10474 comment 12

When running:

    sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread --qos=qos_cpu-dir --account=xyz --exclusive <<EOF
    > #!/bin/bash
    > srun hostname
    > EOF

Job output:

    srun: error: Unable to create step for job 506: More processors requested than permitted

The same sbatch command without exclusive produces the excepted result. It also works with exclusive when using srun directly:

    srun --ntasks=8 --cpus-per-task=10 --hint=nomultithread --qos=qos_cpu-dir --account=xyz --exclusive hostname

---
This may be related to bug 10389
Comment 1 Nate Rini 2021-01-14 11:29:26 MST
(In reply to IDRIS System Team from bug#10474 comment #12)
> Hi!
> 
> The patch seems to solve the current issue (job no longer blocked, no
> block_sync_core_bitmap error) but we experience now a problem when using
> exclusive. We don't know if it's related to the patch.
> 
> When running:
> 
>     sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread
> --qos=qos_cpu-dir --account=xyz --exclusive <<EOF
>     > #!/bin/bash
>     > srun hostname
>     > EOF
> 
> Job output:
> 
>     srun: error: Unable to create step for job 506: More processors
> requested than permitted
> 
> The same sbatch command without exclusive produces the excepted result. It
> also works with exclusive when using srun directly:
> 
>     srun --ntasks=8 --cpus-per-task=10 --hint=nomultithread
> --qos=qos_cpu-dir --account=xyz --exclusive hostname

Is it possible to get logs of this job (or a repeat of it) from slurmctld?
Comment 2 IDRIS System Team 2021-01-19 02:42:27 MST
Created attachment 17530 [details]
Slurmctld log debug5
Comment 3 Nate Rini 2021-01-19 08:53:27 MST
(In reply to Nate Rini from comment #1)
> (In reply to IDRIS System Team from bug#10474 comment #12)
> >     sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread
> > --qos=qos_cpu-dir --account=xyz --exclusive <<EOF
> >     > #!/bin/bash
> >     > srun hostname
> >     > EOF

Please run this test job instead:
>     sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread
> --qos=qos_cpu-dir --account=xyz --exclusive <<EOF
>     > #!/bin/bash
>     > env |grep SLURM
>     > srun -vvvvv --slurmd-debug=debug3 hostname
>     > EOF
Comment 4 IDRIS System Team 2021-01-19 10:11:51 MST
Created attachment 17535 [details]
Job output
Comment 5 Nate Rini 2021-01-19 10:44:32 MST
(In reply to Nate Rini from comment #3)
Please try this test job:
>     sbatch -N2 --ntasks=8 --cpus-per-task=10 --hint=nomultithread --ntasks-per-node=4
> --qos=qos_cpu-dir --account=xyz --exclusive <<EOF
>     > #!/bin/bash
>     > env |grep SLURM
>     > srun -vvvvv --slurmd-debug=debug3 hostname
>     > EOF

Please also attach the slurmd log from the head node of the job.
Comment 6 IDRIS System Team 2021-01-20 01:51:50 MST
Created attachment 17550 [details]
Slurmd log
Comment 8 Nate Rini 2021-01-20 08:41:07 MST
(In reply to IDRIS System Team from comment #6)
> Created attachment 17550 [details]
> Slurmd log

The issue has been reproduced locally. Working on analysis and possible patch.
Comment 9 Nate Rini 2021-01-20 09:59:08 MST
(In reply to Nate Rini from comment #8)
> (In reply to IDRIS System Team from comment #6)
> > Created attachment 17550 [details]
> > Slurmd log
> 
> The issue has been reproduced locally. Working on analysis and possible
> patch.

Please ignore that response, found another bug that causes the same error (opened bug#10669) but is probably not this issue.

When was the last time that slurmctld was restarted?

Is it possible to get a copy of the job submit script? Does it modify any of the job parameters?
> JobSubmitPlugins=lua
Comment 10 IDRIS System Team 2021-01-21 03:12:16 MST
slurmctld was restarted few days ago (2021-01-18T14:54:32)

The job submit script can modify some parameters (partition and account) but we don't think they are relevant here (conditions are not met).

(In reply to Nate Rini from comment #9)
> (In reply to Nate Rini from comment #8)
> > (In reply to IDRIS System Team from comment #6)
> > > Created attachment 17550 [details]
> > > Slurmd log
> > 
> > The issue has been reproduced locally. Working on analysis and possible
> > patch.
> 
> Please ignore that response, found another bug that causes the same error
> (opened bug#10669) but is probably not this issue.
> 
> When was the last time that slurmctld was restarted?
> 
> Is it possible to get a copy of the job submit script? Does it modify any of
> the job parameters?
> > JobSubmitPlugins=lua
Comment 11 Nate Rini 2021-01-21 11:25:48 MST
(In reply to IDRIS System Team from comment #10)
> slurmctld was restarted few days ago (2021-01-18T14:54:32)

Please activate these debugflags in slurmctld:
> scontrol setdebugflags +SelectType
> scontrol setdebugflags +TraceJobs
> scontrol setdebugflags +Steps

Please resubmit the job again and then attach the slurmd and slurmctld logs. Please deactivate the flags after.
> scontrol setdebugflags -SelectType
> scontrol setdebugflags -TraceJobs
> scontrol setdebugflags -Steps
Comment 12 IDRIS System Team 2021-01-22 03:39:54 MST
Created attachment 17578 [details]
Slurmctld log with debugflags SelectType/TraceJobs/Steps
Comment 13 IDRIS System Team 2021-01-22 03:40:21 MST
Created attachment 17579 [details]
Slurmd log with debugflags SelectType/TraceJobs/Steps
Comment 21 Nate Rini 2021-01-25 17:49:44 MST
Created attachment 17616 [details]
debug patch
Comment 22 Nate Rini 2021-01-25 17:50:27 MST
(In reply to Nate Rini from comment #21)
> Created attachment 17616 [details]
> debug patch

Is it possible to apply this patch to slurmctld and then rerun the test from comment #11?
Comment 23 IDRIS System Team 2021-01-28 03:52:06 MST
Hi!

After applying the patch and running the test, the following line appeared in the slurmctld log:

   bug10627: node_tmp=NULL nodes_needed:2 step_spec->max_nodes:2 pick_node_cnt:2 mem_blocked_cpus:0

(In reply to Nate Rini from comment #22)
> (In reply to Nate Rini from comment #21)
> > Created attachment 17616 [details]
> > debug patch
> 
> Is it possible to apply this patch to slurmctld and then rerun the test from
> comment #11?
Comment 24 Nate Rini 2021-02-08 09:54:49 MST
(In reply to IDRIS System Team from comment #23)
>    bug10627: node_tmp=NULL nodes_needed:2 step_spec->max_nodes:2
> pick_node_cnt:2 mem_blocked_cpus:0

That helped determine where the failure was happening. I'm still working on replicating the issue locally.
Comment 26 Nate Rini 2021-02-08 12:52:10 MST
Please try this patch:
> https://github.com/SchedMD/slurm/commit/e8a0930931427a2209a8b27296a8c6ce82f77683

I believe it may solve why the requested jobs have half the memory than my test jobs. 

If not, please provide the same logs as comment #13 with the patch.
Comment 27 IDRIS System Team 2021-02-11 10:12:34 MST
Created attachment 17895 [details]
Slurmd log after mem patch

The patch does not seem to change anything.
Comment 28 Nate Rini 2021-02-11 11:44:35 MST
(In reply to IDRIS System Team from comment #27)
> Created attachment 17895 [details]
> Slurmd log after mem patch
> 
> The patch does not seem to change anything.

Please also attach your slurmctld log with the SelectType and TraceJobs debugflags active.
Comment 29 IDRIS System Team 2021-02-12 00:56:28 MST
Created attachment 17912 [details]
Slurmctld log after mem patch
Comment 33 Nate Rini 2021-02-12 14:13:22 MST
Created attachment 17934 [details]
test patch

(In reply to IDRIS System Team from comment #27)
> Created attachment 17895 [details]
> Slurmd log after mem patch
> 
> The patch does not seem to change anything.

Please give this patch a try. Please provide the same logs as comment #29 if it does not work.
Comment 34 IDRIS System Team 2021-02-17 01:37:00 MST
Created attachment 17957 [details]
Slurmctld log
Comment 35 IDRIS System Team 2021-02-17 01:37:18 MST
Created attachment 17958 [details]
Slurmd log
Comment 36 Nate Rini 2021-02-17 09:25:01 MST
(In reply to IDRIS System Team from comment #34)
> Created attachment 17957 [details]
> Slurmctld log

I'm going to prepare another patch to add more debug loggin, looks like the job is getting allocated 0 memory.
Comment 37 Nate Rini 2021-02-17 09:56:27 MST
Created attachment 17964 [details]
patch to add more logging

(In reply to Nate Rini from comment #36)
> (In reply to IDRIS System Team from comment #34)
> > Created attachment 17957 [details]
> > Slurmctld log
> 
> I'm going to prepare another patch to add more debug loggin, looks like the
> job is getting allocated 0 memory.

Please apply this patch to slurmctld and run the test job. Please then revert it as it is a verbose patch.
Comment 38 IDRIS System Team 2021-02-18 07:36:44 MST
Created attachment 17981 [details]
Slurmctld log with more logging
Comment 39 IDRIS System Team 2021-02-18 07:37:06 MST
Created attachment 17982 [details]
Slurmd log with more logging
Comment 40 IDRIS System Team 2021-02-18 07:49:02 MST
Created attachment 17983 [details]
Patches applied to Slurm v20.02.6

When applying the logging patch, we noticed that we don't have the same source code. We currently use 3 patches to fix bugs #10474, #9670 and #9724. Unfortunately this information was lost when the current issue was split from #10474. Just in case, here is our diff from slurm-20.02.6.tar.bz2.
Comment 42 Nate Rini 2021-02-19 15:23:31 MST
We are still working on analysis based on the logs provided.
Comment 47 IDRIS System Team 2021-02-23 09:59:08 MST
Created attachment 18065 [details]
Slurmctld and slurmd logs
Comment 55 Nate Rini 2021-02-23 16:14:08 MST
Created attachment 18077 [details]
patch for 20.02.6 (IDRIS only)

After a good bit of time of starting at the logs and the current code base, 20.02-6 is significantly different, and attempting to backport the other changes will likely cause more bugs than help.

This patch includes all the fixes as given in comment #40 but does not have any of the patches from this bug.

Please apply this patchset to a clean version of 20.02.6 and test it. We are generally not patching 20.02 anymore.
Comment 56 Nate Rini 2021-03-10 11:58:59 MST
(In reply to Nate Rini from comment #55)
> Please apply this patchset to a clean version of 20.02.6 and test it. We are
> generally not patching 20.02 anymore.

Any updates? Is it possible to test this patch?
Comment 60 IDRIS System Team 2021-03-19 07:13:38 MDT
Hi!

The patch seems to fix all the issues we reported so far.

Now we have a new strange behavior: when we request 4 tasks with 10 physical cores and 4 GPU per nodes (i.e. an entire node), Slurm allocates 2 nodes with 2 tasks and 4 GPU each. The job is charged for 8 GPU which is twice as much as it should be. Of course this doesn't happen when we specify "--node=1" or "--task-per-node=4". Also Slurm allocates only 1 node if we don't request GPU.

Could you tell us if this is an expected behavior or a bug?

$ srun -A abc -n 4 -c 10 --hint=nomultithread --gres=gpu:4 ~/binding_mpi.exe   
srun: job 797 queued and waiting for resources
srun: job 797 has been allocated resources
Hello from level 1: rank= 1, thread level 1= -1, on r7i3n6. (core affinity = 10-19)
Hello from level 1: rank= 0, thread level 1= -1, on r7i3n6. (core affinity = 0-9)
Hello from level 1: rank= 2, thread level 1= -1, on r7i3n7. (core affinity = 0-9)
Hello from level 1: rank= 3, thread level 1= -1, on r7i3n7. (core affinity = 10-19)

$ scontrol show job 797
JobId=797 JobName=binding_mpi.exe
  UserId=user01(1000) GroupId=group01(1000) MCS_label=N/A
  Priority=156250 Nice=0 Account=abc QOS=qos_gpu-t3
  JobState=COMPLETED Reason=None Dependency=(null)
  Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
  RunTime=00:00:01 TimeLimit=00:10:00 TimeMin=N/A
  SubmitTime=2021-03-04T10:55:59 EligibleTime=2021-03-04T10:55:59
  AccrueTime=2021-03-04T10:55:59
  StartTime=2021-03-04T10:55:59 EndTime=2021-03-04T10:56:00 Deadline=N/A
  SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-04T10:55:59
  Partition=gpu_p13 AllocNode:Sid=jean-zay2-ib0:68449
  ReqNodeList=(null) ExcNodeList=(null)
  NodeList=r7i3n[6-7]
  BatchHost=r7i3n6
  NumNodes=2 NumCPUs=80 NumTasks=4 CPUs/Task=10 ReqB:S:C:T=0:0:*:1
  TRES=cpu=80,mem=160G,energy=405,node=2,billing=80,gres/gpu=8
  Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
  MinCPUsNode=10 MinMemoryCPU=2G MinTmpDiskNode=0
  Features=(null) DelayBoot=00:00:00
  OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
  Command=/linkhome/user01/binding_mpi.exe
  WorkDir=/linkhome/group/user01
  Power=
  TresPerNode=gpu:4
  MailUser=user01 MailType=NONE
Comment 61 Nate Rini 2021-03-19 10:19:56 MDT
(In reply to IDRIS System Team from comment #60)
> The patch seems to fix all the issues we reported so far.
Great. We will QA it for upstream inclusion.
 
> Now we have a new strange behavior: when we request 4 tasks with 10 physical
> cores and 4 GPU per nodes (i.e. an entire node), Slurm allocates 2 nodes
> with 2 tasks and 4 GPU each. The job is charged for 8 GPU which is twice as
> much as it should be. Of course this doesn't happen when we specify
> "--node=1" or "--task-per-node=4". Also Slurm allocates only 1 node if we
> don't request GPU.
Please open a new bug for this to avoid confusing the issues. It doesn't look related to this bug.
Comment 68 Nate Rini 2021-03-25 16:32:01 MDT
A modified patch is upstream for slurm-20.11.6:
> https://github.com/SchedMD/slurm/commit/8cd6af0c8ff73f1543c53ba4dbceec137ab8ca33

Closing ticket, please reply if any more issues are found.

Thanks,
--Nate