| Summary: | 17.11.x slurm.spec issues | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Doug Jacobsen <dmjacobsen> |
| Component: | Other | Assignee: | Tim Wickberg <tim> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 17.11.x | ||
| Hardware: | Cray XC | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=4344 | ||
| Site: | NERSC | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | 17.11.0-rc3 | |
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
Commit 75ef08f29d22b3f. Thanks. And it's certainly no problem to hold this open for a bit. src rpm generation was very noisy and concerned me that other aspects of my slurm spec preparation were having problems. Removing %dump clarifies this. diff -rupN a/slurm.spec b/slurm.spec --- a/slurm.spec 2017-11-04 06:40:53.351746302 -0700 +++ b/slurm.spec 2017-11-04 21:01:26.505148395 -0700 @@ -8,7 +8,6 @@ Group: System Environment/Base License: GPLv2+ URL: https://slurm.schedmd.com/ -%dump # when the rel number is one, the tarball filename does not include it %if "%{rel}" == "1" Source: %{name}-%{version}.tar.bz2 I'd appreciate it if that didn't survive into production releases (I'm patching it out for now). Thanks, Doug (In reply to Doug Jacobsen from comment #2) > src rpm generation was very noisy and concerned me that other aspects of my > slurm spec preparation were having problems. Removing %dump clarifies this. > > I'd appreciate it if that didn't survive into production releases (I'm > patching it out for now). Whoops, my mistake there. Fixed with 4ac4986eab78ee43c. watching the build I'm seeing build statements like this on SLES: libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -I../../slurm -I../.. -DNUMA_VERSION1_COMPATIBILITY -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -pthread -c coord_functions.c -fPIC -DPIC -o .libs/coord_functions.o libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../.. -I../../slurm -I../.. -DNUMA_VERSION1_COMPATIBILITY -O2 -g -m64 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -pthread -c coord_functions.c -o coord_functions.o >/dev/null 2>&1 My memory from bug 4014 is that you planned to have -O1 and -ggdb3, which doesn't seem to be happening. I haven't started to dig into why yet. Waiting to see if everything apparently builds. looks like the configure line is getting --disable-debug"
+ export FFLAGS
+ ./configure --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/opt/esslurm --exec-prefix=/opt/esslurm --bindir=/opt/esslurm/bin --sbindir=/opt/esslurm/sbin --sysconfdir=/etc/esslurm --datadir=/opt/esslurm/share --includedir=/opt/esslurm/include --libdir=/opt/esslurm/lib64 --libexecdir=/opt/esslurm/lib --localstatedir=/var --sharedstatedir=/opt/esslurm/com --mandir=/opt/esslurm/share/man --infodir=/opt/esslurm/share/info --disable-dependency-tracking --disable-debug --with-lua
I think this does the trick:
diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec 2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec 2017-11-04 22:28:10.586694477 -0700
@@ -261,7 +261,7 @@ according to the Slurm
%build
%configure \
- %{!?_with_debug:--disable-debug} \
+ %{?_without_debug:--disable-debug} \
%{?_with_pam_dir} \
%{?_with_cpusetdir} \
%{?_with_mysql_config} \
Is there any chance that creating _sysconfdir could be avoided in the basic package install?
diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec 2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec 2017-11-04 22:50:49.401970158 -0700
@@ -443,13 +443,13 @@ rm -rf %{buildroot}
%exclude %{_mandir}/man1/sjobexit*
%exclude %{_mandir}/man1/sjstat*
%dir %{_libdir}/slurm/src
-%dir %{_sysconfdir}
%dir /etc/ld.so.conf.d
/etc/ld.so.conf.d/slurm.conf
#############################################################################
%files example-configs
%defattr(-,root,root,0755)
+%dir %{_sysconfdir}
%if %{with cray}
%config %{_sysconfdir}/plugstack.conf.template
%config %{_sysconfdir}/slurm.conf.template
dmj@alvasmw:~>
I do this because /etc/slurm is a symlink to a read-only path for us and so if the RPM attempts to create the directory or gets antsy about it being a symlink, then it is problematic to update RPMs on a running system.
I can understand if you want to leave it (some sites may rely on it), I can just keep maintaining the patch to not create it.
(In reply to Doug Jacobsen from comment #6) > Is there any chance that creating _sysconfdir could be avoided in the basic > package install? > > I do this because /etc/slurm is a symlink to a read-only path for us and so > if the RPM attempts to create the directory or gets antsy about it being a > symlink, then it is problematic to update RPMs on a running system. > > I can understand if you want to leave it (some sites may rely on it), I can > just keep maintaining the patch to not create it. The only downside I'm seeing is that you don't get an empty directory to start on a new install. But, I think we agree in that the Slurm packages shouldn't be involved in config management/deployment to being with - that's best left to puppet/chef/ansible/salt/symlink-magic/whatever, or a separate site-developed RPM package if they choose to go that route. I'll mull that one over a bit more, but I think that your patch may be a reasonable approach. Thanks again for testing these out; as much as I can beat on them, there's obviously a lot of value from testing with different approaches to install that I haven't anticipated. _with_lua should probably not be tested by configure any more since the option is no longer understood by configure:
diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec 2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec 2017-11-05 13:33:56.541920497 -0800
@@ -272,7 +272,6 @@ according to the Slurm
%{?_with_pmix} \
%{?_with_freeipmi} \
%{?_with_hdf5} \
- %{?_with_lua} \
%{?_with_shared_libslurm} \
%{?_with_cflags}
it looks like you use the variable to test that the correct dev packages are installed.
FYI, SLES 12sp2 (and 12sp3) has either:
dmj@alvasmw:~> find /var/opt/cray/repos/ | grep lua | grep dev
/var/opt/cray/repos/sle-sdk_12sp2_x86-64/x86_64/lua-devel-5.2.2-4.2.x86_64.rpm
/var/opt/cray/repos/sle-sdk_12sp2_x86-64/x86_64/lua51-devel-5.1.5-7.127.x86_64.rpm
dmj@alvasmw:~>
It kind of feels like --with lua /some/path should set lua_CFLAGS and lua_LIBS in the configure step now that the autoconf no longer has any specific enable/disable/with/without options for lua, but that could get complicated.
NERSC finds it useful to keep the slurmpmi for non-Cray MPI stacks (e.g., Intel MPI and OpenMPI), but we move it out of the way so that LD_LIBRARY_PATH doesn't create an issue:
diff -rupN a/slurm.spec b/slurm.spec
--- a/slurm.spec 2017-11-01 08:12:12.000000000 -0700
+++ b/slurm.spec 2017-11-05 22:55:03.597697278 -0800
@@ -300,7 +300,11 @@ install -D -m644 etc/slurmdbd.service %
# Do not package Slurm's version of libpmi on Cray systems.
# Cray's version of libpmi should be used.
%if %{with cray}
+ mkdir %{buildroot}/%{_libdir}/slurmpmi
+ cp %{buildroot}/%{_libdir}/libpmi*so* %{buildroot}/%{_libdir}/slurmpmi/
rm -f %{buildroot}/%{_libdir}/libpmi*
+%endif
+%if %{with cray}
%if %{with cray}
install -D -m644 contribs/cray/plugstack.conf.template %{buildroot}/%{_sysconfdir}/plugstack.conf.template
install -D -m644 contribs/cray/slurm.conf.template %{buildroot}/%{_sysconfdir}/slurm.conf.template
@@ -436,6 +440,9 @@ rm -rf %{buildroot}
%{_libdir}/*.so*
%{_libdir}/slurm/src/*
%{_libdir}/slurm/*.so
+%if %{with cray}
+%{_libdir}/slurmpmi/*.so*
+%endif
%exclude %{_libdir}/slurm/accounting_storage_mysql.so
%exclude %{_libdir}/slurm/job_submit_pbs.so
%exclude %{_libdir}/slurm/spank_pbs.so
I split the %if %{with cray} block because I actually have another condition relating to some custom patches for our externalized slurm. The internal %if {with cray} block appears to be an old relic from when you had slurm_with cray || slurm_with cray_alps (before this 17.11)
Very minor issue. The summary for the slurmctld package has the wrong description: %package slurmctld Summary: Slurm compute node daemon Thanks for all the suggestions, these should all be integrated now, and will be in 17.11.0rc3 and up. Note there's one small change from bug 4344 as well. We'll likely tag rc3 tomorrow with all this in it; please re-open if there's anything else you run into. - Tim commit 45e5c41a1dc80107b88ad0cb932484a2fd7ece13 Author: Doug Jacobsen <dmjacobsen@lbl.gov> Date: Wed Nov 8 23:14:49 2017 -0700 slurm.spec - install Slurm's libpmi in an alternate location. Also collapse a nested %{with cray} block leftover from earlier work. Bug 4332. commit 64ad68879f17938d50f95ba8e58e6ca6f354a99d Author: Doug Jacobsen <dmjacobsen@lbl.gov> Date: Wed Nov 8 23:09:02 2017 -0700 slurm.spec - fix logic for --disable-debug. Logic was inverted from the correct behavior. Bug 4332. commit d3348db1ec01424baa04f0dffea1e4c5b160c9ab Author: Doug Jacobsen <dmjacobsen@lbl.gov> Date: Wed Nov 8 23:06:20 2017 -0700 slurm.spec - move %{_sysconfdir} to example-configs package. Slurm package should not try to manage configs; leave this to the admin to setup as they wish. This avoids an issue on RPM install if /etc/slurm is a symlink to somewhere else. Bug 4332. commit 5351680043a1d0bcdd5599fdc3e550bd3ef13940 Author: Doug Jacobsen <dmjacobsen@lbl.gov> Date: Wed Nov 8 23:03:20 2017 -0700 slurm.spec - remove unused lua flag from configure. Bug 4332. commit 6e417e5d555de92f54c2ca0b5115a5a56c368dec Author: Tim Wickberg <tim@schedmd.com> Date: Wed Nov 8 23:00:30 2017 -0700 slurm.spec - fix description for slurm-slurmctld package. Bug 4332. Hello, Regarding the change to nuke the ld.so.conf.d/slurm. If that is getting removed, then the post install/uninstall scriptlets that run ldconfig can also be removed. Thanks, Doug On Nov 8, 2017 22:23, <bugs@schedmd.com> wrote: > Tim Wickberg <tim@schedmd.com> changed bug 4332 > <https://bugs.schedmd.com/show_bug.cgi?id=4332> > What Removed Added > Resolution --- FIXED > Status UNCONFIRMED RESOLVED > Version Fixed 17.11.0-rc3 > > *Comment # 11 <https://bugs.schedmd.com/show_bug.cgi?id=4332#c11> on bug > 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332> from Tim Wickberg > <tim@schedmd.com> * > > Thanks for all the suggestions, these should all be integrated now, and will be > in 17.11.0rc3 and up. Note there's one small change from bug 4344 <https://bugs.schedmd.com/show_bug.cgi?id=4344> as well. > > We'll likely tag rc3 tomorrow with all this in it; please re-open if there's > anything else you run into. > > - Tim > > commit 45e5c41a1dc80107b88ad0cb932484a2fd7ece13 > Author: Doug Jacobsen <dmjacobsen@lbl.gov> > Date: Wed Nov 8 23:14:49 2017 -0700 > > slurm.spec - install Slurm's libpmi in an alternate location. > > Also collapse a nested %{with cray} block leftover from earlier work. > > Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>. > > commit 64ad68879f17938d50f95ba8e58e6ca6f354a99d > Author: Doug Jacobsen <dmjacobsen@lbl.gov> > Date: Wed Nov 8 23:09:02 2017 -0700 > > slurm.spec - fix logic for --disable-debug. > > Logic was inverted from the correct behavior. > > Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>. > > commit d3348db1ec01424baa04f0dffea1e4c5b160c9ab > Author: Doug Jacobsen <dmjacobsen@lbl.gov> > Date: Wed Nov 8 23:06:20 2017 -0700 > > slurm.spec - move %{_sysconfdir} to example-configs package. > > Slurm package should not try to manage configs; leave this to the > admin to setup as they wish. This avoids an issue on RPM install if > /etc/slurm is a symlink to somewhere else. > > Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>. > > commit 5351680043a1d0bcdd5599fdc3e550bd3ef13940 > Author: Doug Jacobsen <dmjacobsen@lbl.gov> > Date: Wed Nov 8 23:03:20 2017 -0700 > > slurm.spec - remove unused lua flag from configure. > > Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>. > > commit 6e417e5d555de92f54c2ca0b5115a5a56c368dec > Author: Tim Wickberg <tim@schedmd.com> > Date: Wed Nov 8 23:00:30 2017 -0700 > > slurm.spec - fix description for slurm-slurmctld package. > > Bug 4332 <https://bugs.schedmd.com/show_bug.cgi?id=4332>. > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > (In reply to Doug Jacobsen from comment #12) > Hello, > > Regarding the change to nuke the ld.so.conf.d/slurm. If that is getting > removed, then the post install/uninstall scriptlets that run ldconfig can > also be removed. While /usr/lib64 is always in the search path, and thus the ld.so.conf.d file was redundant, ldconfig still needs to update /etc/ld.so.cache to include the newly install libslurm* and libpmi* libraries. So those need to stay put. - Tim |
Hello, I'm getting started on adapting my slurm build system to the new slurm.spec. First issue (jinja templating doesn't like this, and it looks incorrect): diff -rupN a/slurm.spec b/slurm.spec --- a/slurm.spec 2017-11-04 06:40:53.351746302 -0700 +++ b/slurm.spec 2017-11-04 06:41:21.688743010 -0700 @@ -289,7 +289,7 @@ chmod +x find-requires.sh %global _use_internal_dependency_generator 0 %global __find_requires %{_builddir}/%{buildsubdir}/find-requires.sh -rm -rf {%buildroot} +rm -rf %{buildroot} make install DESTDIR=%{buildroot} make install-contrib DESTDIR=%{buildroot} Unless you object, I'll keep this ticket open until I get a buildable source rpm, and identify and further issues along the way.