Ticket 7431

Summary: Slurm 19: sinfo warning seems "wrong" and not aries-qualified
Product: Slurm Reporter: Kevin Buckley <kevin.buckley>
Component: User CommandsAssignee: Albert Gil <albert.gil>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: albert.gil
Version: 19.05.1   
Hardware: Cray XC   
OS: Linux   
Site: Pawsey Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: The full slurm.conf

Description Kevin Buckley 2019-07-18 02:21:17 MDT
Created attachment 10945 [details]
The full slurm.conf

I have rated this as "4 - Minor Issue" on the basis of the 
message being a "WARNING".

Just done our first sinfo under Slurm 19 on our TDS, (having
initially forgotten to deploy the most important piece, the
aries-qualified slurm.conf file - deep sorrow!) but have since
seen this, ahead of the partition listing:

sinfo: error: WARNING: If using PrologFlags=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/cray is required.


Given that our slurm.conf has these two settings


PrologFlags=x11

ProctrackType=proctrack/cray_aries


this would appear to flag up a couple of issues with Slurm's checking
and reporting of that warning, 


1) Why are we getting the warning about PrologFlags=Contain when we 
    aren't using it?

2) Why aren't we getting told that, if we were, we would need an
    aries-qualified ProctrackType value?

I have attached the slurm.conf that was, finally, deployed.

Kevin M. Buckley
-- 
Supercomputing Systems Administrator
Pawsey Supercomputing Centre
Comment 1 Kevin Buckley 2019-07-18 02:37:34 MDT
I guess they share the same abstracted library but, just to
point out that we see the same "not-quite-right" warning from
scontrol as well

sdb:~ # scontrol update partitionname=workq      state=up
scontrol: error: WARNING: If using PrologFlags=Contain for pam_slurm_adopt, either proctrack/cgroup or proctrack/cray is required.
sdb:~ #
Comment 2 Albert Gil 2019-07-18 03:26:52 MDT
Hi Kevin,

> Given that our slurm.conf has these two settings
> PrologFlags=x11
> ProctrackType=proctrack/cray_aries

Thanks for the detailed info and for the slurm.conf,

> 1) Why are we getting the warning about PrologFlags=Contain when we 
>     aren't using it?

As you can see in the manpage of slurm.conf the X11 flag implicitly enables both Contain and Alloc flags as well.
https://slurm.schedmd.com/slurm.conf.html#OPT_X11


> 2) Why aren't we getting told that, if we were, we would need an
>     aries-qualified ProctrackType value?

This seems a problem due the rename of cray to cray_aries.
The message probably shouldn't be shown in your setup.

For more details this message was added in bug 6824, in this commit:
https://github.com/SchedMD/slurm/commit/c6e6089f0c85b4b1f0f8a84bcea2f1ee79d6ec5c

The main rename from cray to cray_aries was done after it on bug 6440 in this commit, but it missed that check/log message:
https://github.com/SchedMD/slurm/commit/1c388e2b20ac461336a5136858f5c1055df42d41

We will review it internally and most probably we will provide you with a  patch to also rename cray to cray_aries in the offending check/ to avoid the warning message that you see. 


I'll let you know,
Albert
Comment 3 Albert Gil 2019-07-18 03:32:43 MDT
Kevin,

> I have rated this as "4 - Minor Issue" on the basis of the 
> message being a "WARNING".

Although this seems quite annoying, the check and the log is actually harmless, so you can ignore it for the moment.
Specially if you are not using pam_slurm_adopt, are you?

> I guess they share the same abstracted library but, just to
point out that we see the same "not-quite-right" warning from
scontrol as well

Exactly, it is generated while reading the slurm.conf that most of the commands do through shared functions.

Sorry for the inconveniences,
Albert
Comment 4 Kevin Buckley 2019-07-18 03:45:48 MDT
(In reply to Albert Gil from comment #2)
>
> As you can see in the manpage of slurm.conf the X11 flag implicitly enables
> both Contain and Alloc flags as well.
> https://slurm.schedmd.com/slurm.conf.html#OPT_X11

OK: understood.

So, can we assume that making that explict, by having say:

  PrologFlags=x11,Contain,Alloc

wouldn't be a problem?

It could be seen as overkill, but it does have the potential for
removing any "implicits" for those who don't yet have a photographic
recall of the slurm.conf manpage (Guilty as charged m'lud).

Thanks for the quick response,
Kevin
Comment 7 Albert Gil 2019-07-18 04:24:35 MDT
Hi Kevin,

> So, can we assume that making that explict, by having say:
>   PrologFlags=x11,Contain,Alloc
> wouldn't be a problem?

Not a problem at all, they are equivalent.

> It could be seen as overkill, but it does have the potential for
> removing any "implicits" for those who don't yet have a photographic
> recall of the slurm.conf manpage (Guilty as charged m'lud).

I can see the potential confusion of implicit config parameters, and maybe we can add some log to make them "more explicits".
But I don't think that we want to change this behavior because as far as I can see the alternative would mean making them "config dependencies", meaning:
- Not start daemons if some of the "config dependencies" of a config parameter are not explicitly set.
- If the user removes that single config parameter he needs also to remove their config dependencies if he doesn't really want them for other reasons.

The current approach of enabling implicit config parameters sounds better to me (probably with more debug logs to make them more explicit at least in the logs), but it's all about a matter of taste.

Probably the best option could be making an command line option to force no-implicit on demand, but as you said it sounds overkill right now.
If you want that, please open a new bug/enhancement and we will discuss it further.

Regards,
Albert
Comment 8 Albert Gil 2019-07-18 04:28:28 MDT
I'm sure it's clear, but just in case:

> > So, can we assume that making that explict, by having say:
> >   PrologFlags=x11,Contain,Alloc
> > wouldn't be a problem?
> 
> Not a problem at all, they are equivalent.

I mean that these two config lines are equivalent:

PrologFlags=x11
PrologFlags=x11,Contain,Alloc

But the X11, Contain and Alloc flags are not equivalent between them.
Of course.
Comment 11 Albert Gil 2019-07-18 08:48:26 MDT
Actually, Marshall made me notice that we are already aware of this in bug 6824 and we are handling it there.
So, I'm closing this one as duplicate of it.

*** This ticket has been marked as a duplicate of ticket 6824 ***