| Summary: | sshare -l dumps core | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Randy Smith <rsmith> |
| Component: | User Commands | Assignee: | Gavin D. Howard <gavin> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek, eckert2 |
| Version: | 19.05.2 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=8305 | ||
| Site: | TGen | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 19.05.6 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
sshare.process.txt
slurm.conf slurmdbd.conf |
||
Randy, Could you please start the sshare under gdb loading the generated core file and check the backtrace: #gdb $(which sshare) /path/to/core (gdb)t a a bt full Please attach the result of the above commands to the bug report. cheers, Marcin Created attachment 12089 [details] sshare.process.txt Attached is the output you requested. On Fri, Oct 25, 2019 at 1:24 AM <bugs@schedmd.com> wrote: > Marcin Stolarek <cinek@schedmd.com> changed bug 7996 > <https://bugs.schedmd.com/show_bug.cgi?id=7996> > What Removed Added > CC cinek@schedmd.com > > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=7996#c1> on bug > 7996 <https://bugs.schedmd.com/show_bug.cgi?id=7996> from Marcin Stolarek > <cinek@schedmd.com> * > > Randy, > > Could you please start the sshare under gdb loading the generated core file and > check the backtrace: > > #gdb $(which sshare) /path/to/core > (gdb)t a a bt full > > Please attach the result of the above commands to the bug report. > > cheers, > Marcin > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > There are a few things I need to solve this bug: 1. Your slurm.conf 2. Your slurmdbd.conf 3. The Linux distro and version on the node where `sshare` was run. 4. The Linux distro and version on the node where `slurmctld` is running. 5. The Linux distro and version on the node where `slurmdbd` is running. 6. If at all possible, your database. If you do send it, please compress it first. Created attachment 12124 [details] slurm.conf Here you go. sshare: linux hpc-utility 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux slurmctld: Linux dback-slurm 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux slurmdbd: Linux dback-slurmdb 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux I'll upload the database dump via your web interface. On Fri, Oct 25, 2019 at 3:24 PM <bugs@schedmd.com> wrote: > *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=7996#c3> on bug > 7996 <https://bugs.schedmd.com/show_bug.cgi?id=7996> from Gavin D. Howard > <gavin@schedmd.com> * > > There are a few things I need to solve this bug: > > 1. Your slurm.conf > 2. Your slurmdbd.conf > 3. The Linux distro and version on the node where `sshare` was run. > 4. The Linux distro and version on the node where `slurmctld` is running. > 5. The Linux distro and version on the node where `slurmdbd` is running. > 6. If at all possible, your database. If you do send it, please compress it > first. > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Created attachment 12125 [details]
slurmdbd.conf
The file is too large to upload here is a link to a google drive location. slurm_acct_db_dump_10_28_19.sql.gz <https://drive.google.com/a/tgen.org/file/d/1hrzQUgcim0EYoOm2mRWwV5nuQeTcXKQw/view?usp=drive_web> On Mon, Oct 28, 2019 at 9:07 AM Randy Smith <rsmith@tgen.org> wrote: > Here you go. > > sshare: linux hpc-utility 3.10.0-957.5.1.el7.x86_64 #1 SMP Fri Feb 1 > 14:54:57 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > slurmctld: Linux dback-slurm 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 > 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > slurmdbd: Linux dback-slurmdb 3.10.0-327.3.1.el7.x86_64 #1 SMP Wed Dec 9 > 14:09:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux > > I'll upload the database dump via your web interface. > > > On Fri, Oct 25, 2019 at 3:24 PM <bugs@schedmd.com> wrote: > >> *Comment # 3 <https://bugs.schedmd.com/show_bug.cgi?id=7996#c3> on bug >> 7996 <https://bugs.schedmd.com/show_bug.cgi?id=7996> from Gavin D. Howard >> <gavin@schedmd.com> * >> >> There are a few things I need to solve this bug: >> >> 1. Your slurm.conf >> 2. Your slurmdbd.conf >> 3. The Linux distro and version on the node where `sshare` was run. >> 4. The Linux distro and version on the node where `slurmctld` is running. >> 5. The Linux distro and version on the node where `slurmdbd` is running. >> 6. If at all possible, your database. If you do send it, please compress it >> first. >> >> ------------------------------ >> You are receiving this mail because: >> >> - You reported the bug. >> >> > > -- > Randy Smith > > Translational Genomics Research Institute > [image: http://www.tgen.org] <http://www.tgen.org/> > 445 N 5th Street, Phoenix, AZ 85004 > <https://maps.google.com/?q=445+N+5th+Street,+Phoenix,+AZ+85004+(602&entry=gmail&source=g> > (602 > <https://maps.google.com/?q=445+N+5th+Street,+Phoenix,+AZ+85004+(602&entry=gmail&source=g>) > 343-8547 > rsmith@tgen.org > Thank you. I have been looking into it, but I don't see a certain fix yet. Thanks for the update. On Tue, Oct 29, 2019 at 1:33 PM <bugs@schedmd.com> wrote: > Gavin D. Howard <gavin@schedmd.com> changed bug 7996 > <https://bugs.schedmd.com/show_bug.cgi?id=7996> > What Removed Added > QA Contact reviewers@schedmd.com > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > *** Ticket 8305 has been marked as a duplicate of this ticket. *** Randy, We have a fix, and it has been committed to 19.05 (so it will be in the next dot release of 19.05) and into 20.02. Thanks again. Closing. |
$We believe we have encountered a bug when running sshare -l on our recently upgraded 19.05.2 Slurm environment. Below is the output from the command. Please advise. sshare -l Account User RawShares NormShares RawUsage NormUsage EffectvUsage FairShare LevelFS GrpTRESMins TRESRunMins -------------------- ---------- ---------- ----------- ----------- ----------- ------------- ---------- ---------- ------------------------------ ------------------------------ root 0.000000 26327672 1.000000 cpu=2588254,mem=26165807462,e+ *** Error in `sshare': free(): invalid next size (fast): 0x000000000061f650 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x7d023)[0x2aaaac0fd023] /opt/slurm/19.05.02/lib/slurm/libslurmfull.so(slurm_xfree+0x1d)[0x2aaaaae44972] /opt/slurm/19.05.02/lib/slurm/libslurmfull.so(print_fields_double+0x27e)[0x2aaaaad9325f] sshare(process+0x50d)[0x40245d] sshare[0x402866] sshare(main+0x9f3)[0x40326b] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaac0a1b15] sshare[0x401d79] ======= Memory map: ======== 00400000-00405000 r-xp 00000000 00:29 270855015 /opt/slurm/19.05.02/bin/sshare 00604000-00605000 r--p 00004000 00:29 270855015 /opt/slurm/19.05.02/bin/sshare 00605000-00606000 rw-p 00005000 00:29 270855015 /opt/slurm/19.05.02/bin/sshare 00606000-00627000 rw-p 00000000 00:00 0 [heap] 2aaaaaaab000-2aaaaaacc000 r-xp 00000000 08:03 269755771 /usr/lib64/ld-2.17.so 2aaaaaacc000-2aaaaaace000 r-xp 00000000 00:00 0 [vdso] 2aaaaaace000-2aaaaaad1000 rw-p 00000000 00:00 0 2aaaaaaee000-2aaaaaaf4000 rw-p 00000000 00:00 0 2aaaaaccc000-2aaaaaccd000 r--p 00021000 08:03 269755771 /usr/lib64/ld-2.17.so 2aaaaaccd000-2aaaaacce000 rw-p 00022000 08:03 269755771 /usr/lib64/ld-2.17.so 2aaaaacce000-2aaaaaccf000 rw-p 00000000 00:00 0 2aaaaaccf000-2aaaaaeab000 r-xp 00000000 00:29 3680300 /opt/slurm/19.05.02/lib/slurm/libslurmfull.so 2aaaaaeab000-2aaaab0ab000 ---p 001dc000 00:29 3680300 /opt/slurm/19.05.02/lib/slurm/libslurmfull.so 2aaaab0ab000-2aaaab0ad000 r--p 001dc000 00:29 3680300 /opt/slurm/19.05.02/lib/slurm/libslurmfull.so 2aaaab0ad000-2aaaab0b8000 rw-p 001de000 00:29 3680300 /opt/slurm/19.05.02/lib/slurm/libslurmfull.so 2aaaab0b8000-2aaaab0be000 rw-p 00000000 00:00 0 2aaaab0be000-2aaaab0c1000 r-xp 00000000 08:03 269970243 /usr/lib64/libdl-2.17.so 2aaaab0c1000-2aaaab2c0000 ---p 00003000 08:03 269970243 /usr/lib64/libdl-2.17.so 2aaaab2c0000-2aaaab2c1000 r--p 00002000 08:03 269970243 /usr/lib64/libdl-2.17.so 2aaaab2c1000-2aaaab2c2000 rw-p 00003000 08:03 269970243 /usr/lib64/libdl-2.17.so 2aaaab2c2000-2aaaab3c3000 r-xp 00000000 08:03 270136694 /usr/lib64/libm-2.17.so 2aaaab3c3000-2aaaab5c2000 ---p 00101000 08:03 270136694 /usr/lib64/libm-2.17.so 2aaaab5c2000-2aaaab5c3000 r--p 00100000 08:03 270136694 /usr/lib64/libm-2.17.so 2aaaab5c3000-2aaaab5c4000 rw-p 00101000 08:03 270136694 /usr/lib64/libm-2.17.so 2aaaab5c4000-2aaaab600000 r-xp 00000000 08:03 270267744 /usr/lib64/libreadline.so.6.2 2aaaab600000-2aaaab800000 ---p 0003c000 08:03 270267744 /usr/lib64/libreadline.so.6.2 2aaaab800000-2aaaab802000 r--p 0003c000 08:03 270267744 /usr/lib64/libreadline.so.6.2 2aaaab802000-2aaaab808000 rw-p 0003e000 08:03 270267744 /usr/lib64/libreadline.so.6.2 2aaaab808000-2aaaab80a000 rw-p 00000000 00:00 0 2aaaab80a000-2aaaab812000 r-xp 00000000 08:03 270117785 /usr/lib64/libhistory.so.6.2 2aaaab812000-2aaaaba11000 ---p 00008000 08:03 270117785 /usr/lib64/libhistory.so.6.2 2aaaaba11000-2aaaaba12000 r--p 00007000 08:03 270117785 /usr/lib64/libhistory.so.6.2 2aaaaba12000-2aaaaba13000 rw-p 00008000 08:03 270117785 /usr/lib64/libhistory.so.6.2 2aaaaba13000-2aaaaba39000 r-xp 00000000 08:03 270160703 /usr/lib64/libncurses.so.5.9 2aaaaba39000-2aaaabc38000 ---p 00026000 08:03 270160703 /usr/lib64/libncurses.so.5.9 2aaaabc38000-2aaaabc39000 r--p 00025000 08:03 270160703 /usr/lib64/libncurses.so.5.9 2aaaabc39000-2aaaabc3a000 rw-p 00026000 08:03 270160703 /usr/lib64/libncurses.so.5.9 2aaaabc3a000-2aaaabc5f000 r-xp 00000000 08:03 270334697 /usr/lib64/libtinfo.so.5.9 2aaaabc5f000-2aaaabe5f000 ---p 00025000 08:03 270334697 /usr/lib64/libtinfo.so.5.9 2aaaabe5f000-2aaaabe63000 r--p 00025000 08:03 270334697 /usr/lib64/libtinfo.so.5.9 2aaaabe63000-2aaaabe64000 rw-p 00029000 08:03 270334697 /usr/lib64/libtinfo.so.5.9 2aaaabe64000-2aaaabe7a000 r-xp 00000000 08:03 270255875 /usr/lib64/libpthread-2.17.so 2aaaabe7a000-2aaaac07a000 ---p 00016000 08:03 270255875 /usr/lib64/libpthread-2.17.so 2aaaac07a000-2aaaac07b000 r--p 00016000 08:03 270255875 /usr/lib64/libpthread-2.17.so 2aaaac07b000-2aaaac07c000 rw-p 00017000 08:03 270255875 /usr/lib64/libpthread-2.17.so 2aaaac07c000-2aaaac080000 rw-p 00000000 00:00 0 2aaaac080000-2aaaac236000 r-xp 00000000 08:03 269840384 /usr/lib64/libc-2.17.so 2aaaac236000-2aaaac436000 ---p 001b6000 08:03 269840384 /usr/lib64/libc-2.17.so 2aaaac436000-2aaaac43a000 r--p 001b6000 08:03 269840384 /usr/lib64/libc-2.17.so 2aaaac43a000-2aaaac43c000 rw-p 001ba000 08:03 269840384 /usr/lib64/libc-2.17.so 2aaaac43c000-2aaaac441000 rw-p 00000000 00:00 0 2aaaac441000-2aaaac44d000 r-xp 00000000 08:03 270224039 /usr/lib64/libnss_files-2.17.so 2aaaac44d000-2aaaac64c000 ---p 0000c000 08:03 270224039 /usr/lib64/libnss_files-2.17.so 2aaaac64c000-2aaaac64d000 r--p 0000b000 08:03 270224039 /usr/lib64/libnss_files-2.17.so 2aaaac64d000-2aaaac64e000 rw-p 0000c000 08:03 270224039 /usr/lib64/libnss_files-2.17.so 2aaaac64e000-2aaaac654000 rw-p 00000000 00:00 0 2aaaac654000-2aaaac65c000 r-xp 00000000 08:03 270224046 /usr/lib64/libnss_sss.so.2 2aaaac65c000-2aaaac85b000 ---p 00008000 08:03 270224046 /usr/lib64/libnss_sss.so.2 2aaaac85b000-2aaaac85c000 r--p 00007000 08:03 270224046 /usr/lib64/libnss_sss.so.2 2aaaac85c000-2aaaac85d000 rw-p 00008000 08:03 270224046 /usr/lib64/libnss_sss.so.2 2aaaac85d000-2aaaac869000 r-xp 00000000 08:03 270224041 /usr/lib64/libnss_ldap.so.2 2aaaac869000-2aaaaca68000 ---p 0000c000 08:03 270224041 /usr/lib64/libnss_ldap.so.2 2aaaaca68000-2aaaaca69000 r--p 0000b000 08:03 270224041 /usr/lib64/libnss_ldap.so.2 2aaaaca69000-2aaaaca6a000 rw-p 0000c000 08:03 270224041 /usr/lib64/libnss_ldap.so.2 2aaaaca6a000-2aaaaca6c000 r-xp 00000000 00:29 2860787 /opt/slurm/19.05.02/lib/slurm/auth_munge.so 2aaaaca6c000-2aaaacc6c000 ---p 00002000 00:29 2860787 /opt/slurm/19.05.02/lib/slurm/auth_munge.so 2aaaacc6c000-2aaaacc6d000 r--p 00002000 00:29 2860787 /opt/slurm/19.05.02/lib/slurm/auth_munge.so 2aaaacc6d000-2aaaacc6e000 rw-p 00003000 00:29 2860787 /opt/slurm/19.05.02/lib/slurm/auth_munge.so 2aaaacc6e000-2aaaacc77000 r-xp 00000000 08:03 270160696 /usr/lib64/libmunge.so.2.0.0 2aaaacc77000-2aaaace76000 ---p 00009000 08:03 270160696 /usr/lib64/libmunge.so.2.0.0 2aaaace76000-2aaaace77000 r--p 00008000 08:03 270160696 /usr/lib64/libmunge.so.2.0.0 2aaaace77000-2aaaace78000 rw-p 00009000 08:03 270160696 /usr/lib64/libmunge.so.2.0.0 2aaaace78000-2aaaace8d000 r-xp 00000000 08:03 270095360 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2aaaace8d000-2aaaad08c000 ---p 00015000 08:03 270095360 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2aaaad08c000-2aaaad08d000 r--p 00014000 08:03 270095360 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2aaaad08d000-2aaaad08e000 rw-p 00015000 08:03 270095360 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 2aaab0000000-2aaab0021000 rw-p 00000000 00:00 0 2aaab0021000-2aaab4000000 ---p 00000000 00:00 0 7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] coh 1 0.020833 0 0.000000 0.000000 3.9818e+22 Aborted (core dumped)