Some nodes are showing down: any up 14-00:00:0 12 down* hpc-22-[19,22],hpc-23-[07,09,12,14,22-23],hpc-25-10,hpc-92-05,hpc-93-[06,15] upon investifation each of these still has a slurmstepd process running on it: [naveed@hpc-22-19 ~]$ ps aux | grep slurm root 21822 0.0 0.0 17120736 14680 ? Sl Apr08 0:01 /central/slurm/install/d/sbin/slurmd root 99409 0.0 0.0 371588 4308 ? Sl Apr10 0:04 slurmstepd: [1443.extern] root 99422 0.0 0.0 755900 6632 ? Sl Apr10 0:00 slurmstepd: [1443.0] dmesg is showing many SLUB errors: [196430.357796] SLUB: Unable to allocate memory on node -1 (gfp=0x8020) [196430.357797] cache: blkdev_ioc(18:step_0), object size: 104, buffer size: 104, default order: 1, min order: 0 [196430.357798] node 0: slabs: 11, objs: 858, free: 0 [196430.357798] node 1: slabs: 10, objs: 741, free: 0 [196430.358048] __get_request: dev 8:0: request aux data allocation failed, iosched may be disturbed mem: [naveed@hpc-22-19 ~]$ free -m total used free shared buff/cache available Mem: 192080 15381 175257 89 1441 175197 Swap: 65535 18 65517 Have you seen this and have any suggestions for preventing it?
Hi Naveed. Process Sl state means Interruptible sleep in a multi-threaded process. We've 3 bugs open reporting a similar problem (4733, 4810, 4690) that we might collapse eventually into a single bug to address the same issue for everyone. In order to confirm you're experiencing the same error: - Can you gdb attach to a couple of these Sl stepds and execute 'bt' and attach here the output? Let's see if the backtace looks like the deadlocks we've already identified in the other logs. - Exact GLIBC version, including all vendor sub-numbering. - Output from /proc/cpuinfo (just the first processor is fine). - Do you have multithreading turned off in your nodes? - Can you also attach slurm.conf and cgroup.conf? Thanks.
Created attachment 6627 [details] cgroup.conf On slurmstepd: [1444.0]: (gdb) bt #0 0x00002aaaabfe732a in wait4 () from /usr/lib64/libc.so.6 #1 0x0000000000410086 in _spawn_job_container (job=0x649a50) at mgr.c:1107 #2 job_manager (job=job@entry=0x649a50) at mgr.c:1216 #3 0x000000000040c9f7 in main (argc=1, argv=0x7fffffffed88) at slurmstepd.c:172 On slurmstepd: [1444.extern]: (gdb) bt #0 0x00002aaaabd15ef7 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x000000000041078f in _wait_for_io (job=0x647760) at mgr.c:2219 #2 job_manager (job=job@entry=0x647760) at mgr.c:1397 #3 0x000000000040c9f7 in main (argc=1, argv=0x7fffffffed88) at slurmstepd.c:172 Glibsc: glibc-2.17-157.el7_3.1.i686 glibc-2.17-157.el7_3.1.x86_64 Hyperthreading is off: [naveed@hpc-25-10 ~]$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz stepping : 4 microcode : 0x200002c cpu MHz : 2100.000 cache size : 22528 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 16 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local bogomips : 4200.00 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: From: "bugs@schedmd.com" <bugs@schedmd.com> Date: Friday, April 13, 2018 at 1:53 AM To: Naveed Near-Ansari <naveed@caltech.edu> Subject: [Bug 5062] slurmstepd stuck, showing nodes down Alejandro Sanchez<mailto:alex@schedmd.com> changed bug 5062<https://bugs.schedmd.com/show_bug.cgi?id=5062> What Removed Added CC alex@schedmd.com Assignee support@schedmd.com alex@schedmd.com Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=5062#c1> on bug 5062<https://bugs.schedmd.com/show_bug.cgi?id=5062> from Alejandro Sanchez<mailto:alex@schedmd.com> Hi Naveed. Process Sl state means Interruptible sleep in a multi-threaded process. We've 3 bugs open reporting a similar problem (4733, 4810, 4690) that we might collapse eventually into a single bug to address the same issue for everyone. In order to confirm you're experiencing the same error: - Can you gdb attach to a couple of these Sl stepds and execute 'bt' and attach here the output? Let's see if the backtace looks like the deadlocks we've already identified in the other logs. - Exact GLIBC version, including all vendor sub-numbering. - Output from /proc/cpuinfo (just the first processor is fine). - Do you have multithreading turned off in your nodes? - Can you also attach slurm.conf and cgroup.conf? Thanks. ________________________________ You are receiving this mail because: · You reported the bug.
Created attachment 6628 [details] slurm.conf
Sorry, will you reattach again and report back 'thread apply all bt' and 'thread apply all bt full'? Thanks
Created attachment 6629 [details] slurmstep_hung From: "bugs@schedmd.com" <bugs@schedmd.com> Date: Friday, April 13, 2018 at 7:26 AM To: Naveed Near-Ansari <naveed@caltech.edu> Subject: [Bug 5062] slurmstepd stuck, showing nodes down Comment # 4<https://bugs.schedmd.com/show_bug.cgi?id=5062#c4> on bug 5062<https://bugs.schedmd.com/show_bug.cgi?id=5062> from Alejandro Sanchez<mailto:alex@schedmd.com> Sorry, will you reattach again and report back 'thread apply all bt' and 'thread apply all bt full'? Thanks Comment # 3<https://bugs.schedmd.com/show_bug.cgi?id=5062#c3> on bug 5062<https://bugs.schedmd.com/show_bug.cgi?id=5062> from Naveed Near-Ansari<mailto:naveed@caltech.edu> Created attachment 6628 [details]<attachment.cgi?id=6628> [details]<attachment.cgi?id=6628&action=edit> slurm.conf ________________________________ You are receiving this mail because: · You reported the bug.
Created attachment 6630 [details] cgroup.conf
Created attachment 6631 [details] slurm.conf
Let me know if you need anything else on this. If not i'll return the nodes to service. Thanks, Naveed
Yes sorry I'd need you gdb attach again to those steps and execute these: (gdb) thread apply all bt and (gdb) thread apply all bt full at first sight it looks like the same issue in the other 3 bugs, but these gdb commands will confirm it. Thanks.
I included those in the attachment slurmstep_hung in the last message. I was getting weird template errors when emailing in the body of the message, so thought that way would be safer. From: "bugs@schedmd.com" <bugs@schedmd.com> Date: Friday, April 13, 2018 at 8:25 AM To: Naveed Near-Ansari <naveed@caltech.edu> Subject: [Bug 5062] slurmstepd stuck, showing nodes down Comment # 9<https://bugs.schedmd.com/show_bug.cgi?id=5062#c9> on bug 5062<https://bugs.schedmd.com/show_bug.cgi?id=5062> from Alejandro Sanchez<mailto:alex@schedmd.com> Yes sorry I'd need you gdb attach again to those steps and execute these: (gdb) thread apply all bt and (gdb) thread apply all bt full at first sight it looks like the same issue in the other 3 bugs, but these gdb commands will confirm it. Thanks. ________________________________ You are receiving this mail because: · You reported the bug.
The backtrace is the same as the one reported in all these 3 bugs. There is a weird interaction between slurmd/stepd forking and calls to glibc's malloc(). The other sites also reported that version of glibc and we are still not sure if the problem comes from glibc itself managing the arena's[1] or if it's a Slurm problem. I'm marking this as a duplicate of bug 4733, so we don't have 4 bugs with the same problem. Thanks for the reported information. [1] Arena A structure that is shared among one or more threads which contains references to one or more heaps, as well as linked lists of chunks within those heaps which are "free". Threads assigned to each arena will allocate memory from that arena's free lists. *** This ticket has been marked as a duplicate of ticket 4733 ***