Hi, Sven Dominik will reach out to more about this ticket but I wanted to get some initial information from you for him. The slurm.conf that we have is for maxwell but he configuration you describe below looks different. Can you attach a recent copy of your slurm.conf for us to review and would you also attach your gres.conf. Created attachment 11520 [details]
maxwell slurm.conf
Find our actual slurm.conf. We don't have a gres.conf
At the moment we see the problem mostly that pending jobs in upex and exfl* partition preempt jobs in the all* partitions.
Hi If these jobs still exist in the system could you send me the output from: scontrol show job 3127852 scontrol show job 3128450 Full slurmctld.log will be useful too. Dominik Hi I can recreate this. I inform you when I find a solution. Dominik Created attachment 11523 [details]
slurmctld log
We've updated slurm to 19.05.2 on "Mo 02 Sep 2019 10:47"
Problem were reported to me on the weekend 6th - 8th of September
Created attachment 11555 [details] Log with massic duplcuiate id due to preemption bug Hi! last evening we had again 18000 preemption events. We also found that 50 nodes will be set to DRAIN because of "Duplicate jobid" errors Sep 12 00:13:00 max-adm01 slurmctld[10333]: backfill: Started JobId=3130441 in all on max-exfl022 Sep 12 01:40:41 max-adm01 slurmctld[10333]: email msg to foo@desy.de: Slurm Job_id=3130441 Name=SSdef2! Ended, Run time 00:00:30, PREEMPTED, ExitCode 0 Sep 12 01:40:41 max-adm01 slurmctld[10333]: preempted JobId=3130441 has been requeued to reclaim resources for JobId=3140569 Sep 12 01:40:43 max-adm01 slurmctld[10333]: Requeuing JobId=3130441 ..repeat 50 times .. Sep 12 01:43:43 max-adm01 slurmctld[10333]: drain_nodes: node max-exfl022 state set to DRAIN Sep 12 01:43:43 max-adm01 slurmctld[10333]: error: Duplicate jobid on nodes max-exfl022, set to state DRAIN Hi We have patch which should solve this issue. This patch is currently under internal Quality Assurance. I let you know when it will be in the repo. Dominik Hi This commit should fix this issue: https://github.com/SchedMD/slurm/commit/0d432caed It will be included in 19.05.3. Please let me know if you find any issues after apply it. Dominik Hello! great news. Could you give me an estimate when 19.05.3 will arrive? Or could I simply patch it in 19.05.2 and just replace the patched slurmctld? best regards! > Von: "bugs" <bugs@schedmd.com> > An: "sven sternberger" <sven.sternberger@desy.de> > Gesendet: Donnerstag, 12. September 2019 16:32:02 > Betreff: [Bug 7708] Nodes will be preempted even if they don't match constaint > [ https://bugs.schedmd.com/show_bug.cgi?id=7708#c10 | Comment # 10 ] on [ > https://bugs.schedmd.com/show_bug.cgi?id=7708 | bug 7708 ] from [ > mailto:bart@schedmd.com | Dominik Bartkiewicz ] > Hi > This commit should fix this issue: [ > https://github.com/SchedMD/slurm/commit/0d432caed | > https://github.com/SchedMD/slurm/commit/0d432caed ] It will be included in > 19.05.3. > Please let me know if you find any issues after apply it. > Dominik > You are receiving this mail because: > * You reported the bug. We plan to release 19.05.3 before end of the month, but we have no strict date yet. You can apply this patch locally on top of 19.05.2. Dominik Created attachment 11587 [details]
slurmctld log file after patch
Hello!
I replaced src/slurmctld/node_scheduler.c in the 19.05.2 sources with the
"new" one and recompiled the slurmctld. I couldn't not only add the
line as there are more changes in the file compared to 19.05.2
I hope this ok, or should I clone the whole repository and rebuild then?
For us it looks ok now, no unnecessary preemption anymore. Attached you
find the actual log.
Best regards and thanks for your help
Sven
Hi This patch contains only one line. I'm glad to hear things are working. Can we drop severity to 3 now, as the patch is already in git repo? Dominik Hello! so would you recommend only to use the one line? and yes since everythink looks ok now we can drop it to 3 best regards! > Von: "bugs" <bugs@schedmd.com> > An: "sven sternberger" <sven.sternberger@desy.de> > Gesendet: Montag, 16. September 2019 13:22:10 > Betreff: [Bug 7708] Nodes will be preempted even if they don't match constaint > [ https://bugs.schedmd.com/show_bug.cgi?id=7708#c14 | Comment # 14 ] on [ > https://bugs.schedmd.com/show_bug.cgi?id=7708 | bug 7708 ] from [ > mailto:bart@schedmd.com | Dominik Bartkiewicz ] > Hi > This patch contains only one line. > I'm glad to hear things are working. > Can we drop severity to 3 now, as the patch is already in git repo? > Dominik > You are receiving this mail because: > * You reported the bug. Hi Those commits contain additional fixes related to preemption when jobs request multiple features. Both will be included in 19.05.3 https://github.com/SchedMD/slurm/commit/f2fcf3af981 https://github.com/SchedMD/slurm/commit/c2a57967cef I'm closing the case now as fixed. In case of any questions related to the issue please feel free to reopen. Dominik *** Ticket 8283 has been marked as a duplicate of this ticket. *** |
Hello! we have several partitions. There are two special ones especially for groups without sufficient resources one is including all nodes with cpus and one is including all nodes with gpu, Both are configured for preemption PartitionName=all PreemptMode=REQUEUE PartitionName=allgpu PreemptMode=REQUEUE User submit jobs to one of the privileged queues (not all*). The job has a constraint like "V100" which can't be fulfiled when it is submitted, because all nodes which match the constraint run jobs from the privileged queue. Slurm now surprisingly preempts a node which is in the privileged queue and in the the all* queue, but don't have the constraint! The job stays in pending. After 3 minutes another node will be preempted this will continue until a node in the privileged queue which fulfill the constrain is free. slurm.conf: NodeName=max-wng[004-007] Weight=25 Realmemory=256000 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=Unknown Feature=INTEL,V4,E5-2640,GPU,P100,GPUx1,256G NodeName=max-wng[010-019] Weight=40 RealMemory=384000 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=Unknown Feature=INTEL,V4,Silver-4114,GPU,V100,GPUx1,384G /var/log/slurm/job_completions: JobId=3127852 UserId=foo(42) GroupId=upex(616) Name=spawner-jupyterhub JobState=PREEMPTED Partition=allgpu TimeLimit=480 StartTime=2019-09-09T13:39:04 EndTime=2019-09-09T13:39:43 NodeList=max-wng007 NodeCnt=1 ProcCnt=40 WorkDir=/home/foo ReservationName= Gres= Account=upex QOS=normal WcKey= Cluster=maxwell SubmitTime=2019-09-09T13:36:45 EligibleTime=2019-09-09T13:38:46 DerivedExitCode=0:0 ExitCode=0:0 JobId=3128450 UserId=faa(43) GroupId=cfel(626) Name=0.8_1.0 JobState=COMPLETED Partition=maxgpu TimeLimit=240 StartTime=2019-09-09T13:50:17 EndTime=2019-09-09T15:13:30 NodeList=max-wng[012-013] NodeCnt=2 ProcCnt=80 WorkDir=/beegfs/desy/group/lux-users/localized_injection_2019/simulation/scan_first_profile_laser_scale_0.800_plasma_scale_1.000 ReservationName= Gres= Account=cfel QOS=cfel WcKey= Cluster=maxwell SubmitTime=2019-09-09T12:27:37 EligibleTime=2019-09-09T12:27:37 DerivedExitCode=0:0 ExitCode=0:0 message.log: Sep 9 10:06:28 max-adm01 slurmctld[24911]: backfill: Started JobId=3127852 in allgpu on max-wng005 Sep 9 13:29:44 max-adm01 slurmctld[24911]: preempted JobId=3127852 has been requeued to reclaim resources for JobId=3128450 Sep 9 13:29:46 max-adm01 slurmctld[24911]: Requeuing JobId=3127852 Sep 9 13:32:03 max-adm01 slurmctld[24911]: backfill: Started JobId=3127852 in allgpu on max-wng007 Sep 9 13:32:43 max-adm01 slurmctld[24911]: preempted JobId=3127852 has been requeued to reclaim resources for JobId=3128450 Sep 9 13:32:46 max-adm01 slurmctld[24911]: Requeuing JobId=3127852 Sep 9 13:35:04 max-adm01 slurmctld[24911]: backfill: Started JobId=3127852 in allgpu on max-wng005 Sep 9 13:36:43 max-adm01 slurmctld[24911]: preempted JobId=3127852 has been requeued to reclaim resources for JobId=3128450 Sep 9 13:36:45 max-adm01 slurmctld[24911]: Requeuing JobId=3127852 Sep 9 13:39:04 max-adm01 slurmctld[24911]: backfill: Started JobId=3127852 in allgpu on max-wng007 Sep 9 13:39:43 max-adm01 slurmctld[24911]: preempted JobId=3127852 has been requeued to reclaim resources for JobId=3128450 Sep 9 13:39:46 max-adm01 slurmctld[24911]: Requeuing JobId=3127852 ... Sep 9 13:50:17 max-adm01 slurmctld[24911]: sched: Allocate JobId=3128450 NodeList=max-wng[012-013] #CPUs=80 Partition=maxgpu Sep 9 15:13:30 max-adm01 slurmctld[24911]: _job_complete: JobId=3128450 WEXITSTATUS 0 Sep 9 15:13:30 max-adm01 slurmctld[24911]: _job_complete: JobId=3128450 done 1. when the job kills innocent all-jobs # sacct -j 3128450 --format=jobid,state,Node,start,end,AllocCPUS,Constraints JobID State NodeList Start End AllocCPUS Constraints ------------ ---------- --------------- ------------------- ------------------- ---------- ------------------- 3128450 PENDING None assigned Unknown Unknown 8 V100 2. after finishing # sacct -j 3128450 --format=jobid,state,Node,start,end,AllocCPUS,Constraints JobID State NodeList Start End AllocCPUS Constraints ------------ ---------- --------------- ------------------- ------------------- ---------- ------------------- 3128450 COMPLETED max-wng[012-01+ 2019-09-09T13:50:17 2019-09-09T15:13:30 80 V100 3128450.bat+ COMPLETED max-wng012 2019-09-09T13:50:17 2019-09-09T15:13:30 40 3128450.0 COMPLETED max-wng[012-01+ 2019-09-09T13:50:17 2019-09-09T15:13:30 2