Summary: | cvmfs issues with job containers | ||
---|---|---|---|
Product: | Slurm | Reporter: | Paul Edmon <pedmon> |
Component: | slurmd | Assignee: | Tim McMullan <mcmullan> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | ||
Version: | 22.05.3 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | Harvard University | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Linux Distro: | --- |
Machine Name: | CLE Version: | ||
Version Fixed: | Target Release: | --- | |
DevPrio: | --- | Emory-Cloud Sites: | --- |
Description
Paul Edmon
2022-09-13 07:30:02 MDT
(In reply to Paul Edmon from comment #0) > We recently upgraded to 22.05.3 as well as enabling the Job Container > plugin. After the upgrade and enabling the plugin we noticed jobs were not > completing cleanly and required the reboot of the node. We also noticed > that cvmfs was not operating properly. After some digging we found this: > > https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels- > of-symbolic-links/156/5 > > It looks like there is a conflict between autofs and cvmfs (which we use) > and the job container plugin. We are going to turn off the job container > plugin, but we'd like to have it on if possible. Thus I'm raising this as a > bug so that you are aware of the issue. It would be good to not impact user > space or dynamic mounting when using the job containers. Hi Paul! I'm not certain what role cvmfs would play here, but we are working on the autofs issue in bug12567. I think based on the issue you linked cvmfs isn't playing a role here and its all the issue with autofs + job_container/tmpfs. Would you agree with that assessment after looking at the other bug? Thanks! --Tim Yeah, I agree. The cvmfs issue at root is really an autofs issue as cvmfs uses autofs under the hood. I would merge this into that ticket. -Paul Edmon- On 9/13/2022 11:19 AM, bugs@schedmd.com wrote: > > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=14954#c1> on > bug 14954 <https://bugs.schedmd.com/show_bug.cgi?id=14954> from Tim > McMullan <mailto:mcmullan@schedmd.com> * > (In reply to Paul Edmon fromcomment #0 <show_bug.cgi?id=14954#c0>) > > We recently upgraded to 22.05.3 as well as enabling the Job Container > plugin. After the upgrade and enabling the plugin we noticed jobs > were not > completing cleanly and required the reboot of the node. We > also noticed > that cvmfs was not operating properly. After some > digging we found this: > > > https://cernvm-forum.cern.ch/t/intermittent-client-failures-too-many-levels- > > of-symbolic-links/156/5 > > It looks like there is a conflict > between autofs and cvmfs (which we use) > and the job container > plugin. We are going to turn off the job container > plugin, but we'd > like to have it on if possible. Thus I'm raising this as a > bug so > that you are aware of the issue. It would be good to not impact user > > space or dynamic mounting when using the job containers. > > Hi Paul! > > I'm not certain what role cvmfs would play here, but we are working on the > autofs issue inbug12567 <show_bug.cgi?id=12567>. I think based on the issue you linked cvmfs isn't > playing a role here and its all the issue with autofs + job_container/tmpfs. > Would you agree with that assessment after looking at the other bug? > > Thanks! > --Tim > ------------------------------------------------------------------------ > You are receiving this mail because: > > * You reported the bug. > (In reply to Paul Edmon from comment #2) > Yeah, I agree. The cvmfs issue at root is really an autofs issue as > cvmfs uses autofs under the hood. I would merge this into that ticket. > > -Paul Edmon- Sounds good, Thanks Paul! I'll merge the tickets now. *** This ticket has been marked as a duplicate of ticket 12567 *** |