In TU-Dresden cluster we observed problems with sview which hangs if many jobs have been submitted/are managed by SLURM . The cores where sview is running go to 100% CPU load, and sview stops updating the information in the GUI. Have you seen this before? Are there configuration parameters that we can change to improve the scalability of sview ? thanks Yiannis
How many jobs are you talking about? I am guessing thousands. There is probably work that could be done in terms of scalability. There is so much going on with sview that does need to, like updating buttons that are not visible and such. I would propose we look at some way to only change the status of what is currently displayed. I am not sure how difficult that will be.
I did some scalability testing with sview a few years ago. My recollection is that sivew was unable to manage more than a couple thousand elements (nodes, jobs, or whatever). Almost all of the time was consumed by the underlying GTK library. Some enhancements were made to improve performance, but I'm not sure how much more is possible except by reducing the number of elements displayed, say only showing the highest priority jobs instead of all jobs.
Another bug with a possible patch was made for this issue, so marking it as duplicate. *** This ticket has been marked as a duplicate of ticket 345 ***