I have noticed that processes in the step_extern cgroups are not cleaned up automatically when a job exits. I do not know if this is intentional or not. I could easily add something to epilog to kill all tasks in step_extern/tasks, if needed, but it would be nice if it gets cleaned up automatically. Otherwise stray tasks can be left behind. As far as I can tell, anything that is actually launched under Slurm gets cleaned up but not anything in the step_extern cgroup.
Hi Ryan, we are looking at your report and get back to you later. David
Hi Ryan, I cannot reproduce your issue. In my test the step_extern directory is always cleaned no matter if I use srun, salloc or sbatch. The only thing I noticed is the step_batch is not cleaned when a job is terminated by scancel, but this seems to be a different issue. I am running on CentOS 6.7 (Final). Which distro are you running and were your job terminated? David
I tried CentOS7 observing the same behaviour. #2105 is logged to track the batch_step not being cleaned up. David
It appears that I left out a crucial detail (oops). It won't clean it up if something is adopted into step_extern from pam_slurm_adopt. The easiest way to reproduce is to do something like salloc in one window. If you don't have pam_slurm_adopt set up, simply ssh into the node then write your pid to the step_extern/tasks file. If pam_slurm_adopt is set up, simply ssh in and check the tasks file for your pid. Then from the salloc window, exit the job. The ssh connection won't terminate automatically and step_extern will still be there. This is on RHEL6.
Thanks for the info, will try it. David
I can confirm this, the slurmstepd that creates the step_extern does not clean up after the job finishes. The exception is when JobAcctGatherType is set to jobacct_gather/cgroup and there are no pids in the step_extern/tasks file, then that plugin removes the step_extern, however if there are active pids in the tasks then the rmdir will fail. There is also no code to terminate these external pids and that's why the ssh connection is active. I don't have the pam_slurm_adopt code installed but I can easily simulate it by stating an xterm and putting it's pid in the tasks file. When the job finishes xterm happily stays around. I am not sure yet why the release agent does not clean up the directory. Perhaps it needs to be updated as well. We will develop a fix for the above problems. David
Ryan - Do you know if this is still a problem after all the other related step_extern changes? David - if you have time next week can you attempt to reproduce on 15.08.4? Enough of the step_extern stuff has changed that it may be resolved now. - Tim
(In reply to Tim Wickberg from comment #7) > Ryan - > > Do you know if this is still a problem after all the other related > step_extern changes? I tried it early today at e7a94813bc996a6b22ffeb13cd17d8b0442aaeff and it hasn't been fixed. It could be related to the processes not being adopted into the cpuset cgroup. See bug 2097 comment 28.
I am looking at this now and will update you guys later. David
I think there 2 issues here. 1) Too many cgroups are created. Since the extern processes are handled by the jobacct cgroup plugin, there is no need to create step_extern directories under cpuset, device and freezer cgroups. Only memory and cpuacct are necessary to capture the remote pids and track them for accounting purpose. 2) There is no code that cleans up the pids in step_extern. I have a patch for that which reads the step_extern/task_0/tasks file and send SIGTERM followed bu SIGKILL to processes listed in the file. I will have a complete patch in a day or so. David
>1) Too many cgroups are created. >Since the extern processes are handled by the jobacct cgroup plugin, >there >is no need to create step_extern directories under cpuset, device and >freezer >cgroups. Only memory and cpuacct are necessary to capture the remote >pids >and track them for accounting purpose. I'm not quite sure I understand this. Maybe for purposes of cleaning up processes and for accounting, you don't need the cpuset or devices cgroups. For controlling which cpus and gpus the job has access to you do. >2) There is no code that cleans up the pids in step_extern. I have a >patch for >that which reads the step_extern/task_0/tasks file and send SIGTERM >followed >bu SIGKILL to processes listed in the file. > >I will have a complete patch in a day or so. I'm in the airport and can't test it at the moment, but I think that a new task is created each time stepd_add_extern_pid is called. You may already be planning to handle that but I figured I should point it out.
Hi in the current implementation only the jobacct gather cgroup plugin handles the external processes, the adopted ones. What would be the purpose of having the external pids in the other cgroups like freezer or cpuset? Unless you want to treat these pids just like all the pids of the job itself. If this is the case way more work needs to be done. On November 16, 2015 8:28:15 PM GMT+01:00, bugs@schedmd.com wrote: >http://bugs.schedmd.com/show_bug.cgi?id=2096 > >--- Comment #11 from Ryan Cox <ryan_cox@byu.edu> --- >>1) Too many cgroups are created. >>Since the extern processes are handled by the jobacct cgroup plugin, >>there >>is no need to create step_extern directories under cpuset, device and >>freezer >>cgroups. Only memory and cpuacct are necessary to capture the remote >>pids >>and track them for accounting purpose. > >I'm not quite sure I understand this. Maybe for purposes of cleaning >up >processes and for accounting, you don't need the cpuset or devices >cgroups. >For controlling which cpus and gpus the job has access to you do. > >>2) There is no code that cleans up the pids in step_extern. I have a >>patch for >>that which reads the step_extern/task_0/tasks file and send SIGTERM >>followed >>bu SIGKILL to processes listed in the file. >> >>I will have a complete patch in a day or so. > >I'm in the airport and can't test it at the moment, but I think that a >new task >is created each time stepd_add_extern_pid is called. You may already >be >planning to handle that but I figured I should point it out. > >-- >You are receiving this mail because: >You are on the CC list for the bug. >You are the assignee for the bug. >You are watching someone on the CC list of the bug. >You are watching the assignee of the bug.
(In reply to David Bigagli from comment #12) > Hi in the current implementation only the jobacct gather cgroup plugin > handles the external processes, the adopted ones. What would be the purpose > of having the external pids in the other cgroups like freezer or cpuset? > Unless you want to treat these pids just like all the pids of the job > itself. If this is the case way more work needs to be done. Basically the extern step + stepd_add_extern_pid + pam_slurm_adopt is meant to provide the following for ssh-launched processes: 1) Proper limit enforcement (cpuset, memory, and devices cgroups) 2) Accounting in the slurm database 3) Cleanup of stray processes when the job exits #1 was done when I used cgroups directly but the cpusets functionality was broken when stepd_add_extern_pid replaced the direct usage of cgroups. #2 was identified as needing a new RPC in bug 2097 comment 15. That new RPC (stepd_add_extern_pid) was what broke #1's cpuset cgroup usage (I'm not sure about devices since we don't currently use it). #2 now works #3 is this bug report.
Here is some additional information in case you want more details: http://tech.ryancox.net/2015/11/pamslurmadopt.html http://tech.ryancox.net/2015/04/caller-id-handling-ssh-launched-processes-in-slurm.html
In commit 4f8741873e285d I have added the code to clean up the external processes in step_extern/task_0/tasks file, once there are no running task under the given cgroup the cgroup itself can be cleaned using rmdir(). David
Ryan, David's patch appears to work for cleaning up stuff. Let us know otherwise, in my testing it seemed to work anyway leaving the processes dead and directory all cleaned up afterwards. Please reopen this if you find otherwise.
Ryan, I just did a altered this slightly in commit's 7ff083533e2e 8a49f409ad21 Let me know if you see anything else on this. I think this will clean things up correctly now.
Thanks. Hopefully I'll have time to test it this week along with the newer fixes for bug 2097.