Hi, We're upgrading Sherlock to Lustre 2.12 and we hit the following issue: slurmstepd: error: _read_lustre_counters: Cannot open /proc/fs/lustre/llite No such file or directory slurmstepd: error: acct_gather_filesystem_p_get_data: Cannot read lustre counters slurmstepd: error: _read_lustre_counters: Cannot open /proc/fs/lustre/llite No such file or directory slurmstepd: error: acct_gather_filesystem_p_get_data: Cannot read lustre counters It looks like in 2.12+ /proc/fs/lustre/llite was moved to /sys/fs/lustre/llite 2.12 is the latest LTS release, it would be nice if Slurm could support it. Thanks! Stephane
Created attachment 8988 [details] Proposed patch to add support for Lustre 2.12 Here's a tentative patch to add support for Lustre 2.12 (Lustre client stats moved to /sys/kernel/debug/lustre/llite). Tested on Lustre 2.10 and Lustre 2.12.
Thanks for the patch Killian, We are working/reviewing in a similar patch and I will attach it here soon. Our patch will probably also create a new parameter on acct_gather.conf, just in case. Albert
Created attachment 9035 [details] Revised patch for Lustre 2.12. Kilian - I split one part off in to commit 7bd7847ff9cd and pushed that already. Are you alright with the attached as the final patch post review? I restructured some of the code, and avoided making some of the variable name changes you did (we try to limit the changes that land on the stable branches as much as possible), but your original intent is still there, and I believe this should work correctly across both Lustre client versions. This also fixes the one performance concern I had on reviewing your patch by storing the directory lookup result - what you'd proposed would have run opendir() + closedir() an extra time on all stats lookups which I'd rather avoid. - Tim
Hi Tim. Thanks so much for reviewing and improving that patch. (In reply to Tim Wickberg from comment #16) > I split one part off in to commit 7bd7847ff9cd and pushed that already. Great! > Are you alright with the attached as the final patch post review? I > restructured some of the code, and avoided making some of the variable name > changes you did (we try to limit the changes that land on the stable > branches as much as possible), but your original intent is still there, and > I believe this should work correctly across both Lustre client versions. > > This also fixes the one performance concern I had on reviewing your patch by > storing the directory lookup result - what you'd proposed would have run > opendir() + closedir() an extra time on all stats lookups which I'd rather > avoid. Absolutely, that looks much better, actually. So, yes, I'm good with your proposed change, thanks again for the review! Cheers, -- Kilian
Stephane, I'm closing this bug as fixed because Killian's patch has been committed by Tim in branch slurm-18.08 and will be released in 18.08.6: commit f2feb6e2fd78a05f94cddea73519f05782944d69 Author: Kilian Cavalotti <kilian@stanford.edu> Date: Mon Jan 28 20:02:10 2019 -0700 Support Lustre 2.12 change to stats location. Lustre client stats moved to /sys/kernel/debug/lustre/llite with 2.12. Bug 6385. Co-authored-by: Tim Wickberg <tim@schedmd.com> Regards, Albert
(In reply to Albert Gil from comment #20) > Stephane, > I'm closing this bug as fixed because Killian's patch has been committed by > Tim in branch slurm-18.08 and will be released in 18.08.6: > > commit f2feb6e2fd78a05f94cddea73519f05782944d69 > Author: Kilian Cavalotti <kilian@stanford.edu> > Date: Mon Jan 28 20:02:10 2019 -0700 > > Support Lustre 2.12 change to stats location. > > Lustre client stats moved to /sys/kernel/debug/lustre/llite with 2.12. > > Bug 6385. > > Co-authored-by: Tim Wickberg <tim@schedmd.com> > > Regards, > Albert Thank you! Cheers, -- Kilian