Ticket 4332 - 17.11.x slurm.spec issues
Summary: 17.11.x slurm.spec issues
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Other (show other tickets)
Version: 17.11.x
Hardware: Cray XC Linux
: 4 - Minor Issue
Assignee: Tim Wickberg
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-11-04 07:44 MDT by Doug Jacobsen
Modified: 2017-11-09 08:53 MST (History)
0 users

See Also:
Site: NERSC
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.11.0-rc3
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Doug Jacobsen 2017-11-04 07:44:59 MDT
Hello,

I'm getting started on adapting my slurm build system to the new slurm.spec.

First issue (jinja templating doesn't like this, and it looks incorrect):

diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-04 06:40:53.351746302 -0700
+++ b/slurm.spec	2017-11-04 06:41:21.688743010 -0700
@@ -289,7 +289,7 @@ chmod +x find-requires.sh
 %global _use_internal_dependency_generator 0
 %global __find_requires %{_builddir}/%{buildsubdir}/find-requires.sh

-rm -rf {%buildroot}
+rm -rf %{buildroot}
 make install DESTDIR=%{buildroot}
 make install-contrib DESTDIR=%{buildroot}




Unless you object, I'll keep this ticket open until I get a buildable source rpm, and identify and further issues along the way.
Comment 1 Tim Wickberg 2017-11-04 10:32:29 MDT
Commit 75ef08f29d22b3f. Thanks.

And it's certainly no problem to hold this open for a bit.
Comment 2 Doug Jacobsen 2017-11-04 22:04:30 MDT
src rpm generation was very noisy and concerned me that other aspects of my slurm spec preparation were having problems.  Removing %dump clarifies this.

diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-04 06:40:53.351746302 -0700
+++ b/slurm.spec	2017-11-04 21:01:26.505148395 -0700
@@ -8,7 +8,6 @@ Group:		System Environment/Base
 License:	GPLv2+
 URL:		https://slurm.schedmd.com/

-%dump
 # when the rel number is one, the tarball filename does not include it
 %if "%{rel}" == "1"
 Source:		%{name}-%{version}.tar.bz2


I'd appreciate it if that didn't survive into production releases (I'm patching it out for now).

Thanks,
Doug
Comment 3 Tim Wickberg 2017-11-04 22:14:38 MDT
(In reply to Doug Jacobsen from comment #2)
> src rpm generation was very noisy and concerned me that other aspects of my
> slurm spec preparation were having problems.  Removing %dump clarifies this.
> 
> I'd appreciate it if that didn't survive into production releases (I'm
> patching it out for now).

Whoops, my mistake there.

Fixed with 4ac4986eab78ee43c.
Comment 4 Doug Jacobsen 2017-11-04 22:37:59 MDT
watching the build I'm seeing build statements like this on SLES:

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../.. -I../../slurm -I../.. -DNUMA_VERSION1_COMPATIBILITY -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -pthread -c coord_functions.c  -fPIC -DPIC -o .libs/coord_functions.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../.. -I../../slurm -I../.. -DNUMA_VERSION1_COMPATIBILITY -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -pthread -c coord_functions.c -o coord_functions.o >/dev/null 2>&1



My memory from bug 4014 is that you planned to have -O1 and -ggdb3, which doesn't seem to be happening.  I haven't started to dig into why yet.  Waiting to see if everything apparently builds.
Comment 5 Doug Jacobsen 2017-11-04 23:32:23 MDT
looks like the configure line is getting --disable-debug"

+ export FFLAGS
+ ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/opt/esslurm --exec-prefix=/opt/esslurm --bindir=/opt/esslurm/bin --sbindir=/opt/esslurm/sbin --sysconfdir=/etc/esslurm --datadir=/opt/esslurm/share --includedir=/opt/esslurm/include --libdir=/opt/esslurm/lib64 --libexecdir=/opt/esslurm/lib --localstatedir=/var --sharedstatedir=/opt/esslurm/com --mandir=/opt/esslurm/share/man --infodir=/opt/esslurm/share/info --disable-dependency-tracking --disable-debug --with-lua


I think this does the trick:

diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec	2017-11-04 22:28:10.586694477 -0700
@@ -261,7 +261,7 @@ according to the Slurm

 %build
 %configure \
-	%{!?_with_debug:--disable-debug} \
+	%{?_without_debug:--disable-debug} \
 	%{?_with_pam_dir} \
 	%{?_with_cpusetdir} \
 	%{?_with_mysql_config} \
Comment 6 Doug Jacobsen 2017-11-04 23:54:55 MDT
Is there any chance that creating _sysconfdir could be avoided in the basic package install?

diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec	2017-11-04 22:50:49.401970158 -0700
@@ -443,13 +443,13 @@ rm -rf %{buildroot}
 %exclude %{_mandir}/man1/sjobexit*
 %exclude %{_mandir}/man1/sjstat*
 %dir %{_libdir}/slurm/src
-%dir %{_sysconfdir}
 %dir /etc/ld.so.conf.d
 /etc/ld.so.conf.d/slurm.conf
 #############################################################################

 %files example-configs
 %defattr(-,root,root,0755)
+%dir %{_sysconfdir}
 %if %{with cray}
 %config %{_sysconfdir}/plugstack.conf.template
 %config %{_sysconfdir}/slurm.conf.template
dmj@alvasmw:~>


I do this because /etc/slurm is a symlink to a read-only path for us and so if the RPM  attempts to create the directory or gets antsy about it being a symlink, then it is problematic to update RPMs on a running system.

I can understand if you want to leave it (some sites may rely on it), I can just keep maintaining the patch to not create it.
Comment 7 Tim Wickberg 2017-11-05 00:21:31 MDT
(In reply to Doug Jacobsen from comment #6)
> Is there any chance that creating _sysconfdir could be avoided in the basic
> package install?
>
> I do this because /etc/slurm is a symlink to a read-only path for us and so
> if the RPM  attempts to create the directory or gets antsy about it being a
> symlink, then it is problematic to update RPMs on a running system.
> 
> I can understand if you want to leave it (some sites may rely on it), I can
> just keep maintaining the patch to not create it.

The only downside I'm seeing is that you don't get an empty directory to start on a new install. But, I think we agree in that the Slurm packages shouldn't be involved in config management/deployment to being with - that's best left to puppet/chef/ansible/salt/symlink-magic/whatever, or a separate site-developed RPM package if they choose to go that route.

I'll mull that one over a bit more, but I think that your patch may be a reasonable approach.

Thanks again for testing these out; as much as I can beat on them, there's obviously a lot of value from testing with different approaches to install that I haven't anticipated.
Comment 8 Doug Jacobsen 2017-11-05 16:07:31 MST
_with_lua should probably not be tested by configure any more since the option is no longer understood by configure:



diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec	2017-11-05 13:33:56.541920497 -0800
@@ -272,7 +272,6 @@ according to the Slurm
 	%{?_with_pmix} \
 	%{?_with_freeipmi} \
 	%{?_with_hdf5} \
-	%{?_with_lua} \
 	%{?_with_shared_libslurm} \
 	%{?_with_cflags}


it looks like you use the variable to test that the correct dev packages are installed.

FYI, SLES 12sp2 (and 12sp3) has either:
dmj@alvasmw:~> find /var/opt/cray/repos/ | grep lua | grep dev
/var/opt/cray/repos/sle-sdk_12sp2_x86-64/x86_64/lua-devel-5.2.2-4.2.x86_64.rpm
/var/opt/cray/repos/sle-sdk_12sp2_x86-64/x86_64/lua51-devel-5.1.5-7.127.x86_64.rpm
dmj@alvasmw:~>

It kind of feels like --with lua /some/path should set lua_CFLAGS and lua_LIBS in the configure step now that the autoconf no longer has any specific enable/disable/with/without options for lua, but that could get complicated.
Comment 9 Doug Jacobsen 2017-11-05 23:58:33 MST
NERSC finds it useful to keep the slurmpmi for non-Cray MPI stacks (e.g., Intel MPI and OpenMPI), but we move it out of the way so that LD_LIBRARY_PATH doesn't create an issue:

diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec	2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec	2017-11-05 22:55:03.597697278 -0800
@@ -300,7 +300,11 @@ install -D -m644 etc/slurmdbd.service  %
 # Do not package Slurm's version of libpmi on Cray systems.
 # Cray's version of libpmi should be used.
 %if %{with cray}
+   mkdir %{buildroot}/%{_libdir}/slurmpmi
+   cp %{buildroot}/%{_libdir}/libpmi*so* %{buildroot}/%{_libdir}/slurmpmi/
    rm -f %{buildroot}/%{_libdir}/libpmi*
+%endif
+%if %{with cray}
    %if %{with cray}
       install -D -m644 contribs/cray/plugstack.conf.template %{buildroot}/%{_sysconfdir}/plugstack.conf.template
       install -D -m644 contribs/cray/slurm.conf.template %{buildroot}/%{_sysconfdir}/slurm.conf.template
@@ -436,6 +440,9 @@ rm -rf %{buildroot}
 %{_libdir}/*.so*
 %{_libdir}/slurm/src/*
 %{_libdir}/slurm/*.so
+%if %{with cray}
+%{_libdir}/slurmpmi/*.so*
+%endif
 %exclude %{_libdir}/slurm/accounting_storage_mysql.so
 %exclude %{_libdir}/slurm/job_submit_pbs.so
 %exclude %{_libdir}/slurm/spank_pbs.so


I split the %if %{with cray} block because I actually have another condition relating to some custom patches for our externalized slurm.  The internal %if {with cray} block appears to be an old relic from when you had slurm_with cray || slurm_with cray_alps (before this 17.11)
Comment 10 Doug Jacobsen 2017-11-06 01:50:40 MST
Very minor issue.  The summary for the slurmctld package has the wrong description:

%package slurmctld
Summary: Slurm compute node daemon
Comment 11 Tim Wickberg 2017-11-08 23:23:06 MST
Thanks for all the suggestions, these should all be integrated now, and will be in 17.11.0rc3 and up. Note there's one small change from bug 4344 as well.

We'll likely tag rc3 tomorrow with all this in it; please re-open if there's anything else you run into.

- Tim

commit 45e5c41a1dc80107b88ad0cb932484a2fd7ece13
Author: Doug Jacobsen <dmjacobsen@lbl.gov>
Date:   Wed Nov 8 23:14:49 2017 -0700

    slurm.spec - install Slurm's libpmi in an alternate location.
    
    Also collapse a nested %{with cray} block leftover from earlier work.
    
    Bug 4332.

commit 64ad68879f17938d50f95ba8e58e6ca6f354a99d
Author: Doug Jacobsen <dmjacobsen@lbl.gov>
Date:   Wed Nov 8 23:09:02 2017 -0700

    slurm.spec - fix logic for --disable-debug.
    
    Logic was inverted from the correct behavior.
    
    Bug 4332.

commit d3348db1ec01424baa04f0dffea1e4c5b160c9ab
Author: Doug Jacobsen <dmjacobsen@lbl.gov>
Date:   Wed Nov 8 23:06:20 2017 -0700

    slurm.spec - move %{_sysconfdir} to example-configs package.
    
    Slurm package should not try to manage configs; leave this to the
    admin to setup as they wish. This avoids an issue on RPM install if
    /etc/slurm is a symlink to somewhere else.
    
    Bug 4332.

commit 5351680043a1d0bcdd5599fdc3e550bd3ef13940
Author: Doug Jacobsen <dmjacobsen@lbl.gov>
Date:   Wed Nov 8 23:03:20 2017 -0700

    slurm.spec - remove unused lua flag from configure.
    
    Bug 4332.

commit 6e417e5d555de92f54c2ca0b5115a5a56c368dec
Author: Tim Wickberg <tim@schedmd.com>
Date:   Wed Nov 8 23:00:30 2017 -0700

    slurm.spec - fix description for slurm-slurmctld package.
    
    Bug 4332.
Comment 12 Doug Jacobsen 2017-11-09 02:59:04 MST
Hello,

Regarding the change to nuke the ld.so.conf.d/slurm.  If that is getting
removed, then the post install/uninstall scriptlets that run ldconfig can
also be removed.

Thanks,
Doug

On Nov 8, 2017 22:23, <bugs@schedmd.com> wrote:

> Tim Wickberg <tim@schedmd.com> changed bug 4332
> <https://bugs.schedmd.com/show_bug.cgi?id=4332>
> What Removed Added
> Resolution --- FIXED
> Status UNCONFIRMED RESOLVED
> Version Fixed   17.11.0-rc3
>
> *Comment # 11 <https://bugs.schedmd.com/show_bug.cgi?id=4332#c11> on bug
> 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332> from Tim Wickberg
> <tim@schedmd.com> *
>
> Thanks for all the suggestions, these should all be integrated now, and will be
> in 17.11.0rc3 and up. Note there's one small change from bug 4344 <https://bugs.schedmd.com/show_bug.cgi?id=4344> as well.
>
> We'll likely tag rc3 tomorrow with all this in it; please re-open if there's
> anything else you run into.
>
> - Tim
>
> commit 45e5c41a1dc80107b88ad0cb932484a2fd7ece13
> Author: Doug Jacobsen <dmjacobsen@lbl.gov>
> Date:   Wed Nov 8 23:14:49 2017 -0700
>
>     slurm.spec - install Slurm's libpmi in an alternate location.
>
>     Also collapse a nested %{with cray} block leftover from earlier work.
>
>     Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>.
>
> commit 64ad68879f17938d50f95ba8e58e6ca6f354a99d
> Author: Doug Jacobsen <dmjacobsen@lbl.gov>
> Date:   Wed Nov 8 23:09:02 2017 -0700
>
>     slurm.spec - fix logic for --disable-debug.
>
>     Logic was inverted from the correct behavior.
>
>     Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>.
>
> commit d3348db1ec01424baa04f0dffea1e4c5b160c9ab
> Author: Doug Jacobsen <dmjacobsen@lbl.gov>
> Date:   Wed Nov 8 23:06:20 2017 -0700
>
>     slurm.spec - move %{_sysconfdir} to example-configs package.
>
>     Slurm package should not try to manage configs; leave this to the
>     admin to setup as they wish. This avoids an issue on RPM install if
>     /etc/slurm is a symlink to somewhere else.
>
>     Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>.
>
> commit 5351680043a1d0bcdd5599fdc3e550bd3ef13940
> Author: Doug Jacobsen <dmjacobsen@lbl.gov>
> Date:   Wed Nov 8 23:03:20 2017 -0700
>
>     slurm.spec - remove unused lua flag from configure.
>
>     Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>.
>
> commit 6e417e5d555de92f54c2ca0b5115a5a56c368dec
> Author: Tim Wickberg <tim@schedmd.com>
> Date:   Wed Nov 8 23:00:30 2017 -0700
>
>     slurm.spec - fix description for slurm-slurmctld package.
>
>     Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>.
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 13 Tim Wickberg 2017-11-09 08:53:51 MST
(In reply to Doug Jacobsen from comment #12)
> Hello,
> 
> Regarding the change to nuke the ld.so.conf.d/slurm.  If that is getting
> removed, then the post install/uninstall scriptlets that run ldconfig can
> also be removed.

While /usr/lib64 is always in the search path, and thus the ld.so.conf.d file was redundant, ldconfig still needs to update /etc/ld.so.cache to include the newly install libslurm* and libpmi* libraries.

So those need to stay put.

- Tim