Ticket 14954 - cvmfs issues with job containers
Summary: cvmfs issues with job containers
Status: RESOLVED DUPLICATE of ticket 12567
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmd (show other tickets)
Version: 22.05.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Tim McMullan
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2022-09-13 07:30 MDT by Paul Edmon
Modified: 2022-09-13 09:23 MDT (History)
0 users

See Also:
Site: Harvard University
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Paul Edmon 2022-09-13 07:30:02 MDT
We recently upgraded to 22.05.3 as well as enabling the Job Container plugin.  After the upgrade and enabling the plugin we noticed jobs were not completing cleanly and required the reboot of the node.  We also noticed that cvmfs was not operating properly.  After some digging we found this:

https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels-of-symbolic-links/156/5

It looks like there is a conflict between autofs and cvmfs (which we use) and the job container plugin.  We are going to turn off the job container plugin, but we'd like to have it on if possible.  Thus I'm raising this as a bug so that you are aware of the issue.  It would be good to not impact user space or dynamic mounting when using the job containers.
Comment 1 Tim McMullan 2022-09-13 09:19:31 MDT
(In reply to Paul Edmon from comment #0)
> We recently upgraded to 22.05.3 as well as enabling the Job Container
> plugin.  After the upgrade and enabling the plugin we noticed jobs were not
> completing cleanly and required the reboot of the node.  We also noticed
> that cvmfs was not operating properly.  After some digging we found this:
> 
> https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels-
> of-symbolic-links/156/5
> 
> It looks like there is a conflict between autofs and cvmfs (which we use)
> and the job container plugin.  We are going to turn off the job container
> plugin, but we'd like to have it on if possible.  Thus I'm raising this as a
> bug so that you are aware of the issue.  It would be good to not impact user
> space or dynamic mounting when using the job containers.

Hi Paul!

I'm not certain what role cvmfs would play here, but we are working on the autofs issue in bug12567.  I think based on the issue you linked cvmfs isn't playing a role here and its all the issue with autofs + job_container/tmpfs.  Would you agree with that assessment after looking at the other bug?

Thanks!
--Tim
Comment 2 Paul Edmon 2022-09-13 09:21:05 MDT
Yeah, I agree.  The cvmfs issue at root is really an autofs issue as 
cvmfs uses autofs under the hood.  I would merge this into that ticket.

-Paul Edmon-

On 9/13/2022 11:19 AM, bugs@schedmd.com wrote:
>
> *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=14954#c1> on 
> bug 14954 <https://bugs.schedmd.com/show_bug.cgi?id=14954> from Tim 
> McMullan <mailto:mcmullan@schedmd.com> *
> (In reply to Paul Edmon fromcomment #0  <show_bug.cgi?id=14954#c0>)
> > We recently upgraded to 22.05.3 as well as enabling the Job Container > plugin. After the upgrade and enabling the plugin we noticed jobs 
> were not > completing cleanly and required the reboot of the node. We 
> also noticed > that cvmfs was not operating properly. After some 
> digging we found this: > > 
> https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels- 
> > of-symbolic-links/156/5 > > It looks like there is a conflict 
> between autofs and cvmfs (which we use) > and the job container 
> plugin. We are going to turn off the job container > plugin, but we'd 
> like to have it on if possible. Thus I'm raising this as a > bug so 
> that you are aware of the issue. It would be good to not impact user > 
> space or dynamic mounting when using the job containers.
>
> Hi Paul!
>
> I'm not certain what role cvmfs would play here, but we are working on the
> autofs issue inbug12567  <show_bug.cgi?id=12567>.  I think based on the issue you linked cvmfs isn't
> playing a role here and its all the issue with autofs + job_container/tmpfs.
> Would you agree with that assessment after looking at the other bug?
>
> Thanks!
> --Tim
> ------------------------------------------------------------------------
> You are receiving this mail because:
>
>   * You reported the bug.
>
Comment 3 Tim McMullan 2022-09-13 09:23:13 MDT
(In reply to Paul Edmon from comment #2)
> Yeah, I agree.  The cvmfs issue at root is really an autofs issue as 
> cvmfs uses autofs under the hood.  I would merge this into that ticket.
> 
> -Paul Edmon-

Sounds good, Thanks Paul!  I'll merge the tickets now.

*** This ticket has been marked as a duplicate of ticket 12567 ***