|
Description
Marcin Stolarek
2021-01-14 01:09:45 MST
(In reply to IDRIS System Team from bug#10474 comment #12) > Hi! > > The patch seems to solve the current issue (job no longer blocked, no > block_sync_core_bitmap error) but we experience now a problem when using > exclusive. We don't know if it's related to the patch. > > When running: > > sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread > --qos=qos_cpu-dir --account=xyz --exclusive <<EOF > > #!/bin/bash > > srun hostname > > EOF > > Job output: > > srun: error: Unable to create step for job 506: More processors > requested than permitted > > The same sbatch command without exclusive produces the excepted result. It > also works with exclusive when using srun directly: > > srun --ntasks=8 --cpus-per-task=10 --hint=nomultithread > --qos=qos_cpu-dir --account=xyz --exclusive hostname Is it possible to get logs of this job (or a repeat of it) from slurmctld? Created attachment 17530 [details]
Slurmctld log debug5
(In reply to Nate Rini from comment #1) > (In reply to IDRIS System Team from bug#10474 comment #12) > > sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread > > --qos=qos_cpu-dir --account=xyz --exclusive <<EOF > > > #!/bin/bash > > > srun hostname > > > EOF Please run this test job instead: > sbatch --ntasks=8 --cpus-per-task=10 --hint=nomultithread > --qos=qos_cpu-dir --account=xyz --exclusive <<EOF > > #!/bin/bash > > env |grep SLURM > > srun -vvvvv --slurmd-debug=debug3 hostname > > EOF Created attachment 17535 [details]
Job output
(In reply to Nate Rini from comment #3) Please try this test job: > sbatch -N2 --ntasks=8 --cpus-per-task=10 --hint=nomultithread --ntasks-per-node=4 > --qos=qos_cpu-dir --account=xyz --exclusive <<EOF > > #!/bin/bash > > env |grep SLURM > > srun -vvvvv --slurmd-debug=debug3 hostname > > EOF Please also attach the slurmd log from the head node of the job. Created attachment 17550 [details]
Slurmd log
(In reply to IDRIS System Team from comment #6) > Created attachment 17550 [details] > Slurmd log The issue has been reproduced locally. Working on analysis and possible patch. (In reply to Nate Rini from comment #8) > (In reply to IDRIS System Team from comment #6) > > Created attachment 17550 [details] > > Slurmd log > > The issue has been reproduced locally. Working on analysis and possible > patch. Please ignore that response, found another bug that causes the same error (opened bug#10669) but is probably not this issue. When was the last time that slurmctld was restarted? Is it possible to get a copy of the job submit script? Does it modify any of the job parameters? > JobSubmitPlugins=lua slurmctld was restarted few days ago (2021-01-18T14:54:32) The job submit script can modify some parameters (partition and account) but we don't think they are relevant here (conditions are not met). (In reply to Nate Rini from comment #9) > (In reply to Nate Rini from comment #8) > > (In reply to IDRIS System Team from comment #6) > > > Created attachment 17550 [details] > > > Slurmd log > > > > The issue has been reproduced locally. Working on analysis and possible > > patch. > > Please ignore that response, found another bug that causes the same error > (opened bug#10669) but is probably not this issue. > > When was the last time that slurmctld was restarted? > > Is it possible to get a copy of the job submit script? Does it modify any of > the job parameters? > > JobSubmitPlugins=lua (In reply to IDRIS System Team from comment #10) > slurmctld was restarted few days ago (2021-01-18T14:54:32) Please activate these debugflags in slurmctld: > scontrol setdebugflags +SelectType > scontrol setdebugflags +TraceJobs > scontrol setdebugflags +Steps Please resubmit the job again and then attach the slurmd and slurmctld logs. Please deactivate the flags after. > scontrol setdebugflags -SelectType > scontrol setdebugflags -TraceJobs > scontrol setdebugflags -Steps Created attachment 17578 [details]
Slurmctld log with debugflags SelectType/TraceJobs/Steps
Created attachment 17579 [details]
Slurmd log with debugflags SelectType/TraceJobs/Steps
Created attachment 17616 [details]
debug patch
(In reply to Nate Rini from comment #21) > Created attachment 17616 [details] > debug patch Is it possible to apply this patch to slurmctld and then rerun the test from comment #11? Hi! After applying the patch and running the test, the following line appeared in the slurmctld log: bug10627: node_tmp=NULL nodes_needed:2 step_spec->max_nodes:2 pick_node_cnt:2 mem_blocked_cpus:0 (In reply to Nate Rini from comment #22) > (In reply to Nate Rini from comment #21) > > Created attachment 17616 [details] > > debug patch > > Is it possible to apply this patch to slurmctld and then rerun the test from > comment #11? (In reply to IDRIS System Team from comment #23) > bug10627: node_tmp=NULL nodes_needed:2 step_spec->max_nodes:2 > pick_node_cnt:2 mem_blocked_cpus:0 That helped determine where the failure was happening. I'm still working on replicating the issue locally. Please try this patch: > https://github.com/SchedMD/slurm/commit/e8a0930931427a2209a8b27296a8c6ce82f77683 I believe it may solve why the requested jobs have half the memory than my test jobs. If not, please provide the same logs as comment #13 with the patch. Created attachment 17895 [details]
Slurmd log after mem patch
The patch does not seem to change anything.
(In reply to IDRIS System Team from comment #27) > Created attachment 17895 [details] > Slurmd log after mem patch > > The patch does not seem to change anything. Please also attach your slurmctld log with the SelectType and TraceJobs debugflags active. Created attachment 17912 [details]
Slurmctld log after mem patch
Created attachment 17934 [details] test patch (In reply to IDRIS System Team from comment #27) > Created attachment 17895 [details] > Slurmd log after mem patch > > The patch does not seem to change anything. Please give this patch a try. Please provide the same logs as comment #29 if it does not work. Created attachment 17957 [details]
Slurmctld log
Created attachment 17958 [details]
Slurmd log
(In reply to IDRIS System Team from comment #34) > Created attachment 17957 [details] > Slurmctld log I'm going to prepare another patch to add more debug loggin, looks like the job is getting allocated 0 memory. Created attachment 17964 [details] patch to add more logging (In reply to Nate Rini from comment #36) > (In reply to IDRIS System Team from comment #34) > > Created attachment 17957 [details] > > Slurmctld log > > I'm going to prepare another patch to add more debug loggin, looks like the > job is getting allocated 0 memory. Please apply this patch to slurmctld and run the test job. Please then revert it as it is a verbose patch. Created attachment 17981 [details]
Slurmctld log with more logging
Created attachment 17982 [details]
Slurmd log with more logging
Created attachment 17983 [details] Patches applied to Slurm v20.02.6 When applying the logging patch, we noticed that we don't have the same source code. We currently use 3 patches to fix bugs #10474, #9670 and #9724. Unfortunately this information was lost when the current issue was split from #10474. Just in case, here is our diff from slurm-20.02.6.tar.bz2. We are still working on analysis based on the logs provided. Please apply these patches from bug#9724: > https://github.com/SchedMD/slurm/commit/49a7d7f9fb9d554c3f51a33bc5de3bb3e9249a35 > https://github.com/SchedMD/slurm/commit/2a09c94e8bb2cc42697c032ebbe5f9d1107fc4c2 Created attachment 18065 [details]
Slurmctld and slurmd logs
Created attachment 18077 [details] patch for 20.02.6 (IDRIS only) After a good bit of time of starting at the logs and the current code base, 20.02-6 is significantly different, and attempting to backport the other changes will likely cause more bugs than help. This patch includes all the fixes as given in comment #40 but does not have any of the patches from this bug. Please apply this patchset to a clean version of 20.02.6 and test it. We are generally not patching 20.02 anymore. (In reply to Nate Rini from comment #55) > Please apply this patchset to a clean version of 20.02.6 and test it. We are > generally not patching 20.02 anymore. Any updates? Is it possible to test this patch? Hi! The patch seems to fix all the issues we reported so far. Now we have a new strange behavior: when we request 4 tasks with 10 physical cores and 4 GPU per nodes (i.e. an entire node), Slurm allocates 2 nodes with 2 tasks and 4 GPU each. The job is charged for 8 GPU which is twice as much as it should be. Of course this doesn't happen when we specify "--node=1" or "--task-per-node=4". Also Slurm allocates only 1 node if we don't request GPU. Could you tell us if this is an expected behavior or a bug? $ srun -A abc -n 4 -c 10 --hint=nomultithread --gres=gpu:4 ~/binding_mpi.exe srun: job 797 queued and waiting for resources srun: job 797 has been allocated resources Hello from level 1: rank= 1, thread level 1= -1, on r7i3n6. (core affinity = 10-19) Hello from level 1: rank= 0, thread level 1= -1, on r7i3n6. (core affinity = 0-9) Hello from level 1: rank= 2, thread level 1= -1, on r7i3n7. (core affinity = 0-9) Hello from level 1: rank= 3, thread level 1= -1, on r7i3n7. (core affinity = 10-19) $ scontrol show job 797 JobId=797 JobName=binding_mpi.exe UserId=user01(1000) GroupId=group01(1000) MCS_label=N/A Priority=156250 Nice=0 Account=abc QOS=qos_gpu-t3 JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:01 TimeLimit=00:10:00 TimeMin=N/A SubmitTime=2021-03-04T10:55:59 EligibleTime=2021-03-04T10:55:59 AccrueTime=2021-03-04T10:55:59 StartTime=2021-03-04T10:55:59 EndTime=2021-03-04T10:56:00 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-04T10:55:59 Partition=gpu_p13 AllocNode:Sid=jean-zay2-ib0:68449 ReqNodeList=(null) ExcNodeList=(null) NodeList=r7i3n[6-7] BatchHost=r7i3n6 NumNodes=2 NumCPUs=80 NumTasks=4 CPUs/Task=10 ReqB:S:C:T=0:0:*:1 TRES=cpu=80,mem=160G,energy=405,node=2,billing=80,gres/gpu=8 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=10 MinMemoryCPU=2G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/linkhome/user01/binding_mpi.exe WorkDir=/linkhome/group/user01 Power= TresPerNode=gpu:4 MailUser=user01 MailType=NONE (In reply to IDRIS System Team from comment #60) > The patch seems to fix all the issues we reported so far. Great. We will QA it for upstream inclusion. > Now we have a new strange behavior: when we request 4 tasks with 10 physical > cores and 4 GPU per nodes (i.e. an entire node), Slurm allocates 2 nodes > with 2 tasks and 4 GPU each. The job is charged for 8 GPU which is twice as > much as it should be. Of course this doesn't happen when we specify > "--node=1" or "--task-per-node=4". Also Slurm allocates only 1 node if we > don't request GPU. Please open a new bug for this to avoid confusing the issues. It doesn't look related to this bug. A modified patch is upstream for slurm-20.11.6:
> https://github.com/SchedMD/slurm/commit/8cd6af0c8ff73f1543c53ba4dbceec137ab8ca33
Closing ticket, please reply if any more issues are found.
Thanks,
--Nate
|