| Summary: | cgroup devices directories deleted from jobs older than one day. | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | NYU HPC Team <hpc-staff> |
| Component: | slurmd | Assignee: | Alejandro Sanchez <alex> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | 17.02.1 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NYU | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
NYU HPC Team
2017-07-21 11:33:57 MDT
Starting in 17.02.0 you no longer need CgroupReleaseAgentDir in your cgroup.conf that should be removed. Slurm should cleanup automatically the cgroups as needed. That being said I know there were issues with cleanup that were fixed in 17.02.3. Would it be possible for you to upgrade to at least that version (if not the latest 17.02.6) and see if that fixes your issues? If this doesn't help Alex will help you further. Hi. Did you manage to upgrade to the latest 17.02 and test the cgroup automatic hierarchy removal? is there anything else that you need from us? Thanks. We have been running on 17.02.1. However I do see that part of source code is difference from 17.02.6. We need to schedule a download to upgrade Slurm which will be after the Summer. Thank you! All right. The commit that Danny talks about in comment 1 is this: https://github.com/SchedMD/slurm/commit/24e2cb07e8e363f24dda036637be97f90507fcd6 I'm closing this as resolved/infogiven. Please reopen if you encounter further issues. Thanks. |