Hi, the mean depth is always less than the last queue length at the command "sdiag". 03.07.2020 : Depth Mean (try depth): 375 Last queue length: 999 09.07.2020: Depth Mean: 797 Depth Mean (try depth): 626 Last queue length: 1099 We started with [2020-06-18T14:45:12.141] backfill: completed testing 8(8) jobs. After we defined the SchedulerParameters = bf_continue,bf_max_time=60 the backfill: completed testing jobs increase to [2020-07-09T11:35:14.797] backfill: completed testing 375(1) jobs, usec=844880 [2020-07-09T11:36:44.162] backfill: completed testing 403(3) jobs, usec=1534323 . Why isn't it possible to test all jobs with the backfill algorithm? Thanks! -- Brigitte May
Hi Can you send me full sdiag output and slurm.conf? Dominik
Created attachment 14961 [details] sdiag_20200709
Created attachment 14962 [details] slurm.conf
Hi Could you try to use these SchedulerParameters? SchedulerParameters = bf_max_job_test=400,bf_max_time=120,bf_continue,bf_resolution=120,bf_window=8640 After applying this change, if you send me next sdiag output (with ~5 backfill cycle) we can make the next tuning iteration. Dominik
Hi, Fr Jul 10-10:06:34 (34/10233) - ACTIVE /etc/slurm# scontrol reconfigure Fr Jul 10-10:07:08 (36/10235) - ACTIVE /etc/slurm# scontrol show config | grep SchedulerParameters SchedulerParameters = bf_max_job_test=400,bf_max_time=120,bf_continue,bf_resolution=120,bf_window=8640 Fr Jul 10-10:08:21 (15/665) /var/log/slurm# tail -5000 slurmctld.log | grep "backfill: completed testing" [2020-07-10T10:07:01.696] backfill: completed testing 182(182) jobs, usec=12101456 [2020-07-10T10:08:06.035] backfill: completed testing 436(251) jobs, usec=188765 [2020-07-10T10:08:55.183] backfill: completed testing 437(251) jobs, usec=207134 [2020-07-10T10:09:44.397] backfill: completed testing 437(251) jobs, usec=205988 [2020-07-10T10:10:37.730] backfill: completed testing 437(251) jobs, usec=205493 Fr Jul 10-10:11:24 (16/666) /var/log/slurm# tail -5000 slurmctld.log | grep "backfill: completed testing" [2020-07-10T10:08:55.183] backfill: completed testing 437(251) jobs, usec=207134 [2020-07-10T10:09:44.397] backfill: completed testing 437(251) jobs, usec=205988 [2020-07-10T10:10:37.730] backfill: completed testing 437(251) jobs, usec=205493 [2020-07-10T10:11:35.253] backfill: completed testing 439(246) jobs, usec=185437 [2020-07-10T10:12:47.049] backfill: completed testing 438(246) jobs, usec=185180 Brigitte
Hi Could you reset bf statistic "sdiag -r" and send me sdiag output grabbed after ~15 minutes? Dominik
Hi I forgot, could you also send me slurmctld.log? Dominik
Created attachment 14977 [details] sdiag after reset with sdiag -r
Created attachment 14978 [details] slurmctld.log since [2020-07-10T09:52:22.021]
Log like this means that all "Eligible" jobs from the queue were processed. "backfill: reached end of job queue" Could you send me the output from "squeue --start" In the log, I noticed that uc1n679 seems to be misconfigured. ----- I would also like to point out that we take our severity levels very seriously and ask that you set the severity accordingly since a severity 2 and severity 1 would disrupt current work which we are engaged in and are also attached to the service level agreements. The severity should reflect the impact on the system only. In this case, it seems like you are asking for configuration assistance which is best suited for a severity 3 or 4. Below is a link to the support site which describes ticket severity. https://www.schedmd.com/support.php SEVERITY LEVELS Severity 1 — Major Impact A Severity 1 issue occurs when there is a continued system outage that affects a large set of end users. The system is down and non-functional due to Slurm problem(s) and no procedural workaround exists. Severity 2 — High Impact A Severity 2 issue is a high-impact problem that is causing sporadic outages or is consistently encountered by end users with adverse impact to end user interaction with the system. Severity 3 — Medium Impact A Severity 3 issue is a medium-to-low impact problem that includes partial non-critical loss of system access or which impairs some operations on the system but allows the end user to continue to function on the system with workarounds. Severity 4 — Minor Issues A Severity 4 issue is a minor issue with limited or no loss in functionality within the customer environment. Severity 4 issues may also be used for recommendations for future product enhancements or modifications. Dominik
Created attachment 14980 [details] squeue --start
Hi, I changed the importance to 3 - medium impact. Thanks, Brigitte
Hi It seems that backfill is processing all eligible jobs. The difference between a number of processed jobs and the number of jobs in the queue is caused by many jobs that stopped due to AssocGrpJobsLimit, PartitionConfig and Dependency. As additional tuning, you can try to add "bf_running_job_reserve" to SchedulerParameters and slightly increase "bf_max_time" Dominik
Hi any news for this issue? Dominik
Hi, last week I changed the SchedulerParameters SchedulerParameters=bf_max_job_test=400,bf_max_time=150,bf_continue,bf_resolution=120,bf_window=8640,bf_running_job_reserve for one day, but the backfilling was not good. Then I set SchedulerParameters=bf_max_job_test=400,bf_max_time=90,bf_continue,bf_resolution=120,bf_window=8640 At the moment often nodes are idle and many jobs are waiting . Do Jul 30-10:43:12 (11/10892) - ACTIVE root@uc2n999:/etc/slurm# sinfo -t idle PARTITION AVAIL TIMELIMIT NODES STATE NODELIST dev_single up 30:00 0 n/a single up 3-00:00:00 0 n/a dev_multiple up 30:00 8 idle uc2n[001-008] multiple up 3-00:00:00 0 n/a fat up 3-00:00:00 0 n/a dev_multiple_e up 30:00 7 idle uc1n[601-607] multiple_e up 3-00:00:00 3 idle uc1n[744,866,900] jupyter_uc1e up 3-00:00:00 2 idle uc1n[611-612] dev_special up 30:00 2 idle uc1n[931-932] special up 3-00:00:00 0 n/a gpu_4 up 2-00:00:00 0 n/a gpu_8 up 2-00:00:00 5 idle uc2n[508,510,515-516,518] slurm up infinite 0 n/a tsmserver up infinite 0 n/a login up infinite 0 n/a headnode up infinite 0 n/a Now I will go back to the Parameters before: SchedulerParameters=bf_max_job_test=400,bf_max_time=120,bf_continue,bf_resolution=120,bf_window=8640 because the behaviour of the job scheduling since last week is bad. Brigitte
(In reply to Brigitte May from comment #15) > Hi, > > last week I changed the SchedulerParameters > > SchedulerParameters=bf_max_job_test=400,bf_max_time=150,bf_continue, > bf_resolution=120,bf_window=8640,bf_running_job_reserve > > for one day, but the backfilling was not good. Hi Can you describe what mean 'not good'? > Then I set > SchedulerParameters=bf_max_job_test=400,bf_max_time=90,bf_continue, > bf_resolution=120,bf_window=8640 > > At the moment often nodes are idle and many jobs are waiting . > Do Jul 30-10:43:12 (11/10892) - ACTIVE > root@uc2n999:/etc/slurm# sinfo -t idle > PARTITION AVAIL TIMELIMIT NODES STATE NODELIST > dev_single up 30:00 0 n/a > single up 3-00:00:00 0 n/a > dev_multiple up 30:00 8 idle uc2n[001-008] > multiple up 3-00:00:00 0 n/a > fat up 3-00:00:00 0 n/a > dev_multiple_e up 30:00 7 idle uc1n[601-607] > multiple_e up 3-00:00:00 3 idle uc1n[744,866,900] > jupyter_uc1e up 3-00:00:00 2 idle uc1n[611-612] > dev_special up 30:00 2 idle uc1n[931-932] > special up 3-00:00:00 0 n/a > gpu_4 up 2-00:00:00 0 n/a > gpu_8 up 2-00:00:00 5 idle uc2n[508,510,515-516,518] > slurm up infinite 0 n/a > tsmserver up infinite 0 n/a > login up infinite 0 n/a > headnode up infinite 0 n/a > > Now I will go back to the Parameters before: > SchedulerParameters=bf_max_job_test=400,bf_max_time=120,bf_continue, > bf_resolution=120,bf_window=8640 > > because the behaviour of the job scheduling since last week is bad. Can you send me outputs from 'squeue --start' and sdiag? As I mentioned in comment 13, I think increasing bf_max_time is the direction you should follow. Dominik
Hi, in comparison to 22.7.2020 [2020-07-22T03:41:45.228] backfill: completed testing 619(2) jobs the number of completed testing jobs was smaller (see attachement 2020_07_23). At the moment we have tickets with long latency in class multiple and in interactive mode. see outputs for more information Thanks! Brigitte
Created attachment 15238 [details] backfill testing in comparison with and without bf_running_job_reserve
Created attachment 15239 [details] sdiag
Created attachment 15240 [details] squeue --start
Hi When exactly did you enable bf_running_job_reserve? I don't see that enabling this parameter made backfill works less efficient. From sdiag I noticed that someone (I expect some script) send over 200 rpcs/sec. This number of rpcs can kill scheduling performance. Do you know who and why it is generating all these requests? sdiag_20200730: ... REQUEST_JOB_INFO_SINGLE ( 2021) count:5958111 ave_time:476721 total_time:2840358163735 REQUEST_JOB_INFO ( 2003) count:1801077 ave_time:502850 total_time:905671648182 ... om0394 ( 223703) count:16392346 ave_time:175409 total_time:2875379660294 root ( 0) count:3300217 ave_time:309919 total_time:1022802299381 bf4607 ( 218591) count:1931293 ave_time:178975 total_time:345654240236 hu_mathlout ( 927457) count:732438 ave_time:332149 total_time:243279257036 ... Dominik
Hi, I enable bf_running_job_reserve 2020-07-23 15:21:19 scontrol reconfigure till 2020-07-24 11:16:31 scontrol reconfigure Brigitte
Hi, I don't know who and why it is generating all these requests. Do you mean especially the accounts om0394 and so on. Brigitte
Hi Yes, I think that amount of requests coming from users is one of the root causes of this bug. Maybe you can ask if they use any script/tool which generates this amount of RPCs. Dominik
Hi Can you grab some perf data from ~10 minutes (both perf.data.tar.bz2 and perf.data)? Maybe this will show us some bottleneck. eg.: perf record -s --call-graph dwarf -p `pidof slurmctld` perf archive perf.data then send both perf.data.tar.bz2 and perf.data Dominik
Hi, my colleagues will send you the data. I'm on holiday till 24 august. Kind regards Brigitte
Created attachment 15320 [details] perf record -s --call-graph dwarf -p 31126 sleep 600;perf archive perf.data Hi, I send you the file perf.data.tar.bz2 which you requested. Best regards Karl-Heinz
Sehr geehrte Damen und Herren, Ich bin ab dem 25.08.2020 wieder erreichbar. E-Mails werden während meiner Abwesenheit nicht automatisch weitergeleitet. Mit freundlichen Grüßen Brigitte May
Hi Thanks. I know that this is confusing but perf.data.tar.bz2 and perf.data contain different data, and I need them both. Dominik
Created attachment 15321 [details] perf record -s --call-graph dwarf -p 31126 sleep 600;bzip2 perf.data Hi I have compressed the perf.data file (bzip2 perf.data). The original was too large. This is my second attempt. Best regards Karl-Heinz
Hi I think you accidentally send me one more time perf.data.tar.bz Dominik
Hi I send you 2 files. The first file was created by the command "perf record -s --call-graph dwarf -p 31126 sleep 600". The output was perf.data and then I created perf.data.tar.bz2 with the command "perf archive perf.data". The original file perf.data took too long for sending. That's why I compressed the perf.data file with the comand "bzip2 perf.data". Was this correct or need you the uncompressed perf.data file? Best regards Am 05.08.2020 um 16:32 schrieb bugs@schedmd.com: > > *Comment # 31 <https://bugs.schedmd.com/show_bug.cgi?id=9365#c31> on > bug 9365 <https://bugs.schedmd.com/show_bug.cgi?id=9365> from Dominik > Bartkiewicz <mailto:bart@schedmd.com> * > Hi > > I think you accidentally send me one more time perf.data.tar.bz > > Dominik > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. >
Hi attachment 15321 [details] and attachment 15320 [details] are the same. I don't see perf.data.bz2. Dominik
Created attachment 15326 [details] perf.data.bz2 Hi Sorry, but something went wrong yesterday. Best regards Karl-Heinz Am 05.08.2020 um 17:48 schrieb bugs@schedmd.com: > > *Comment # 33 <https://bugs.schedmd.com/show_bug.cgi?id=9365#c33> on > bug 9365 <https://bugs.schedmd.com/show_bug.cgi?id=9365> from Dominik > Bartkiewicz <mailto:bart@schedmd.com> * > Hi > > attachment 15321 [details] <attachment.cgi?id=15321> [details] > <attachment.cgi?id=15321&action=edit> andattachment 15320 <attachment.cgi?id=15320> [details] > <attachment.cgi?id=15320&action=edit> are the same. > I don't see perf.data.bz2. > > Dominik > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You are on the CC list for the bug. >
Hi, unfortunately I'm not authorized to access bug #9592 , which you sent me in the mail from 17.08.2020. Could you change this? Thanks! Brigitte
Hi Sorry for spamming, it was only automatic mail from bugzilla. Bug 9592 is internal (readable only for SchedMD staff). It was open to track potential performance issues in part_data_build_row_bitmaps(). This bug was false-positive and it is closed now. Dominik
Hi Those commits address one of the hot points showed in perf. https://github.com/SchedMD/slurm/compare/cd1f0094dee...3f196e097641 These commits will be included in 20.11. After those changes configuration "PreemptType=preempt/qos + SelectType=select/cons_tres" shoud be significantly faster on system with bignumber of runnig jobs. Can we close this ticket now? As I wrote in comment 13, backfill on your system works correctly and processing all eligible jobs, even under huge load generated by users RPCs. Dominik
Hi, thank you very much!!! You can close the ticket. Kind regards, Brigitte May
*** Ticket 10271 has been marked as a duplicate of this ticket. ***