Ticket 14595

Summary: Build error for 22.05.2 on Ubuntu 20.04
Product: Slurm Reporter: Ali Nikkhah <alin4>
Component: Build System and PackagingAssignee: Nate Rini <nate>
Status: RESOLVED FIXED QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: alex, alin4, cinek, ihmesa, nate
Version: 22.05.2   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=13242
Site: U WA Health Metrics Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: slurm-22.05.4 Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Ticket Depends on:    
Ticket Blocks: 14602    
Attachments: 22.05.2 build error
slurm 22.05 build error --disable-x11
dpkg -l | grep -i x11
patch for 2205 (v2)
config.log 22.05.2 patched

Description Ali Nikkhah 2022-07-22 11:19:48 MDT
Created attachment 25984 [details]
22.05.2 build error

We are attempting to prepare an upgrade of our cluster to Slurm 22.05.2, but the build is throwing an error. We build and run Slurm on Ubuntu 20.04. Build steps:

./configure \
    --prefix=/opt/slurm \
    --sysconfdir=/opt/slurm/etc/slurm \
    --without-shared-libslurm \
    --enable-pam \
    --with-pam_dir=/lib/x86_64-linux-gnu/security \
    --with-pmix=/usr/lib/x86_64-linux-gnu/pmix
make


This leads to the attached build output (error at the end). Any ideas as to the cause and how can we get past this?
Comment 1 Ben Roberts 2022-07-22 16:49:25 MDT
Hi Ali,

At first glance this looks like it has to do with x11.  Can you list your installed packages and grep for x11 related ones?  I'm curious if you add the '--disable-x11' flag to your configure line if you're able to build?  You don't have to leave it disabled, but that would let us confirm that it is related to the x11 packages.

Thanks,
Ben
Comment 2 Ali Nikkhah 2022-07-22 17:50:31 MDT
Hi Ben,

I ran:

./configure \
    --prefix=/opt/slurm \
    --sysconfdir=/opt/slurm/etc/slurm \
    --without-shared-libslurm \
    --enable-pam \
    --with-pam_dir=/lib/x86_64-linux-gnu/security \
    --with-pmix=/usr/lib/x86_64-linux-gnu/pmix \
    --disable-x11
make

and it appears the result is the same. I will attach the build output and the list of x11-related packages.
Comment 3 Ali Nikkhah 2022-07-22 17:51:26 MDT
Created attachment 25998 [details]
slurm 22.05 build error --disable-x11

Make output with --disable-x11.
Comment 4 Ali Nikkhah 2022-07-22 17:52:08 MDT
Created attachment 25999 [details]
dpkg -l | grep -i x11

List of X11-related packages on build image.
Comment 9 Ali Nikkhah 2022-07-27 11:31:37 MDT
Is there any more information I can provide to help with this? I am happy to run more builds or debugging as needed.
Comment 10 Nate Rini 2022-07-27 19:21:23 MDT
(In reply to Ali Nikkhah from comment #9)
> Is there any more information I can provide to help with this? I am happy to
> run more builds or debugging as needed.

We have recreated the issue locally and are working on a patch set to correct the issue.
Comment 11 Ali Nikkhah 2022-07-28 10:55:40 MDT
(In reply to Nate Rini from comment #10)
> (In reply to Ali Nikkhah from comment #9)
> > Is there any more information I can provide to help with this? I am happy to
> > run more builds or debugging as needed.
> 
> We have recreated the issue locally and are working on a patch set to
> correct the issue.

Thanks!
Comment 13 Nate Rini 2022-07-28 12:54:21 MDT
Created attachment 26068 [details]
patch for 2205 (v2)

(In reply to Ali Nikkhah from comment #11)
> (In reply to Nate Rini from comment #10)
> > (In reply to Ali Nikkhah from comment #9)
> > > Is there any more information I can provide to help with this? I am happy to
> > > run more builds or debugging as needed.
> > 
> > We have recreated the issue locally and are working on a patch set to
> > correct the issue.
> 
> Thanks!

Please try this patch to see if it resolves your issue.
Comment 14 Ali Nikkhah 2022-07-28 14:22:35 MDT
Hi Nate,

The patch does not resolve the issue for 22.05.2. I modified the patch slightly so it would apply cleanly (just removed the NEWS modification) to 22.05.2.

patch --verbose -p1 < ~/git/slurm-package/src/patches/bug14595.2205.v2.patch
./configure \
    --prefix=/opt/slurm \
    --sysconfdir=/opt/slurm/etc/slurm \
    --without-shared-libslurm \
    --enable-pam \
    --with-pam_dir=/lib/x86_64-linux-gnu/security \
    --with-pmix=/usr/lib/x86_64-linux-gnu/pmix
make

Error:

/bin/sh ../../../libtool  --tag=CC   --mode=link gcc  -DNUMA_VERSION1_COMPATIBILITY -g -O2 -fno-omit-frame-pointer -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -export-dynamic  -Wl,-rpath -Wl,/usr/lib64 -L/usr/lib64  -o slurmstepd container.o slurmstepd.o mgr.o task.o slurmstepd_job.o io.o ulimits.o pdebug.o pam_ses.o read_oci_conf.o req.o multi_prog.o step_terminate_monitor.o x11_forwarding.o ../common/libslurmd_common.o ../../../src/api/libslurm.o -ldl -lhwloc -lpam -lpam_misc -lutil  -lpthread -lm -lresolv 
libtool: link: gcc -DNUMA_VERSION1_COMPATIBILITY -g -O2 -fno-omit-frame-pointer -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -Wl,-rpath -Wl,/usr/lib64 -o slurmstepd container.o slurmstepd.o mgr.o task.o slurmstepd_job.o io.o ulimits.o pdebug.o pam_ses.o read_oci_conf.o req.o multi_prog.o step_terminate_monitor.o x11_forwarding.o ../common/libslurmd_common.o ../../../src/api/libslurm.o -Wl,--export-dynamic  -L/usr/lib64 -ldl -lhwloc -lpam -lpam_misc -lutil -lpthread -lm -lresolv -pthread
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_str2flags':
/home/ali/slurm/slurm-git/src/common/x11_util.c:66: multiple definition of `x11_str2flags'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:66: first defined here
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_delete_xauth':
/home/ali/slurm/slurm-git/src/common/x11_util.c:275: multiple definition of `x11_delete_xauth'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:275: first defined here
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_flags2str':
/home/ali/slurm/slurm-git/src/common/x11_util.c:82: multiple definition of `x11_flags2str'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:82: first defined here
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_get_display':
/home/ali/slurm/slurm-git/src/common/x11_util.c:103: multiple definition of `x11_get_display'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:103: first defined here
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_get_xauth':
/home/ali/slurm/slurm-git/src/common/x11_util.c:156: multiple definition of `x11_get_xauth'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:156: first defined here
/usr/bin/ld: ../../../src/api/libslurm.o: in function `x11_set_xauth':
/home/ali/slurm/slurm-git/src/common/x11_util.c:219: multiple definition of `x11_set_xauth'; mgr.o:/home/ali/slurm/slurm-git/src/slurmd/slurmstepd/../../../src/common/x11_util.c:219: first defined here
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:622: slurmstepd] Error 1
make[4]: Leaving directory '/home/ali/slurm/slurm-git/src/slurmd/slurmstepd'
make[3]: *** [Makefile:515: all-recursive] Error 1
make[3]: Leaving directory '/home/ali/slurm/slurm-git/src/slurmd'
make[2]: *** [Makefile:545: all-recursive] Error 1
make[2]: Leaving directory '/home/ali/slurm/slurm-git/src'
make[1]: *** [Makefile:624: all-recursive] Error 1
make[1]: Leaving directory '/home/ali/slurm/slurm-git'
make: *** [Makefile:523: all] Error 2


I built from the slurm-22.05 git branch just to double-check myself and it fails without the patch, but succeeds with it. However, the build failure is different in the git branch than 22.05.2:

/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -DNUMA_VERSION1_COMPATIBILITY -g -O2 -fno-omit-frame-pointer -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -export-dynamic   -o slurmrestd http.o operations.o slurmrestd.o rest_auth.o ../../src/api/libslurm.o -ldl -L/usr/lib64 -lhttp_parser libslurmrest_ref.la -lpthread -lm -lresolv 
libtool: link: gcc -DNUMA_VERSION1_COMPATIBILITY -g -O2 -fno-omit-frame-pointer -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -o slurmrestd http.o operations.o slurmrestd.o rest_auth.o ../../src/api/libslurm.o -Wl,--export-dynamic  -ldl -L/usr/lib64 -lhttp_parser ./.libs/libslurmrest_ref.a -lpthread -lm -lresolv -pthread
/usr/bin/ld: ../../src/api/libslurm.o: in function `parse_host_port':
/home/ali/slurm/slurm-git/src/common/http.c:308: multiple definition of `parse_host_port'; http.o:/home/ali/slurm/slurm-git/src/slurmrestd/http.c:803: first defined here
/usr/bin/ld: ../../src/api/libslurm.o: in function `free_parse_host_port':
/home/ali/slurm/slurm-git/src/common/http.c:314: multiple definition of `free_parse_host_port'; http.o:/home/ali/slurm/slurm-git/src/slurmrestd/http.c:856: first defined here
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:642: slurmrestd] Error 1
make[4]: Leaving directory '/home/ali/slurm/slurm-git/src/slurmrestd'
make[3]: *** [Makefile:695: all-recursive] Error 1
make[3]: Leaving directory '/home/ali/slurm/slurm-git/src/slurmrestd'
make[2]: *** [Makefile:545: all-recursive] Error 1
make[2]: Leaving directory '/home/ali/slurm/slurm-git/src'
make[1]: *** [Makefile:624: all-recursive] Error 1
make[1]: Leaving directory '/home/ali/slurm/slurm-git'
make: *** [Makefile:523: all] Error 2


So it appears the x11 errors in 22.05.2 are fixed in the slurm-22.05 branch, but the http errors were introduced (which your patch fixes).
Comment 15 Nate Rini 2022-07-28 15:14:28 MDT
(In reply to Ali Nikkhah from comment #14)
> The patch does not resolve the issue for 22.05.2. I modified the patch
> slightly so it would apply cleanly (just removed the NEWS modification) to
> 22.05.2.

Please attach the config.log from this run.

> I built from the slurm-22.05 git branch just to double-check myself and it
> fails without the patch, but succeeds with it. However, the build failure is
> different in the git branch than 22.05.2:

Patch is not upstream yet. These errors look like what I saw. Does the slurm-22.05 branch with the patch in comment#13 work?
Comment 16 Ali Nikkhah 2022-07-28 15:43:33 MDT
Created attachment 26073 [details]
config.log 22.05.2 patched

Here is config.log for the failed 22.05.2 patched build.
Comment 17 Ali Nikkhah 2022-07-28 15:47:39 MDT
(In reply to Nate Rini from comment #15)
> Patch is not upstream yet. These errors look like what I saw. Does the
> slurm-22.05 branch with the patch in comment#13 work?

Yes, the slurm-22.05 branch builds with the patch.
Comment 18 Nate Rini 2022-07-28 15:49:29 MDT
(In reply to Ali Nikkhah from comment #17)
> (In reply to Nate Rini from comment #15)
> > Patch is not upstream yet. These errors look like what I saw. Does the
> > slurm-22.05 branch with the patch in comment#13 work?
> 
> Yes, the slurm-22.05 branch builds with the patch.

Great, there is no reason to debug why 22.05.2 doesn't build as the issue has already been resolved. We will follow the normal QA process. With any luck, the patch will get included with 22.05.3.
Comment 20 Ali Nikkhah 2022-07-28 18:41:01 MDT
Sounds good, thanks for your help! I found the commit that fixes the 22.05.2 build error in the git history.
Comment 25 Nate Rini 2022-08-29 15:32:17 MDT
(In reply to Ali Nikkhah from comment #20)
> Sounds good, thanks for your help! I found the commit that fixes the 22.05.2
> build error in the git history.

This is now fixed upstream for the pending slurm-22.05.4 release:
> https://github.com/SchedMD/slurm/commit/e6f28f51b8da60a1a5ce86fcc6bf8275bbf732d9