Created attachment 1830 [details] Tarball for seff and smail utilities This is a contribution: the Slurm job efficiency report (seff). Summary: seff takes a jobid and reports on the efficiency of that job's cpu and memory utilization. The rpm/tarball comes with an 'smail' utility that allows for Slurm end-of-job emails to include a seff report. This allows users to become aware if they are wasting resources. > seff Usage: seff [Options] <Jobid> Options: -h Help menu -v Version -d Debug mode: display raw Slurm data The seff output is mostly self-explanatory: > seff 3485050 Job ID: 3485050 Cluster: della User/Group: dmcr/cses State: COMPLETED (exit code 0) Nodes: 2 Cores per node: 2 CPU Utilized: 00:00:01 CPU Efficiency: 0.40% of 00:04:08 core-walltime Memory Utilized: 2.04 GB (estimated maximum) Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node) The smail utility is invoked automatically to process end-of-job notifications by adding the following to /etc/slurm/slurm.conf: MailProg=/usr/bin/smail This script parses the notification subject line, and generates the requested email with seff output as the body. Please let me know if you have any questions. Best regards, Dennis
Created attachment 1831 [details] Springdale/RedHat 6 binary RPM for seff and smail utilities
I forgot to mention that this script relies on the jobs_get() functionality of the Perl API, and hence requires Slurm 15.08. Dennis
Created attachment 2127 [details] Springdale/RedHat 6 SRPM for seff and smail utilities
I've added this to the "master" branch of Slurm, which will be in version 16.05 (May 2015, we pretty much limit changes other than bug fixes to major releases, I assume you'll manage this as a local patch for now). I added a copyright notice to both scripts: # Copyright 2015 Princeton University Research Computing I also modified the smail script so that it would identify the location of seff rather than assume it is in /usr/bin. I also set these files in their own RPM (named "slurm-seff"). Let us know if you make other changes going forward. I think this will prove very helpful to many other Slurm users. Thanks! Commit here: https://github.com/SchedMD/slurm/commit/93d9189c35be9d603cfedb09b55c4110a9b5779a
You're welcome Moe. Your changes look good to me. Best, Dennis
Created attachment 3179 [details] Remove Data::Dumper dependency seff crashes when Data::Dumper is not installed (and that one is not required by the slurm-seff rpm). It's not used so it doesn't need to be loaded.
Created attachment 3180 [details] Preserve original subject Preserving the original subject has it's pro's, both for backward compatibility and for spotting failed jobs between thousands of successful jobs.
One of our users asked for "smail" to prepend the ClusterName to the subject line so that mails from Slurm jobs can be more easily identified or filtered. I made a small change to the smail script that seems to do the trick, see https://github.com/OleHolmNielsen/Slurm_tools/tree/master/smail. If you like it, please accept this as a contribution.
(In reply to Ole.H.Nielsen@fysik.dtu.dk from comment #8) > One of our users asked for "smail" to prepend the ClusterName to the subject > line so that mails from Slurm jobs can be more easily identified or filtered. > > I made a small change to the smail script that seems to do the trick, see > https://github.com/OleHolmNielsen/Slurm_tools/tree/master/smail. > > If you like it, please accept this as a contribution. Thanks for your contribution. It has been added to the version 17.11 code base here: https://github.com/SchedMD/slurm/commit/d26370f88e437581ed034a20f3ebf56047dad187 I added the install directory path to scontrol (not a problem in your case) here: https://github.com/SchedMD/slurm/commit/be89d5c0f944e9538c821fda299e9e9c050c0440
(In reply to Dennis McRitchie from comment #0) > Created attachment 1830 [details] > Tarball for seff and smail utilities > > This is a contribution: the Slurm job efficiency report (seff). > > Summary: > seff takes a jobid and reports on the efficiency of that job's cpu and > memory utilization. The rpm/tarball comes with an 'smail' utility that > allows for Slurm end-of-job emails to include a seff report. This allows > users to become aware if they are wasting resources. > > > seff > Usage: seff [Options] <Jobid> > Options: > -h Help menu > -v Version > -d Debug mode: display raw Slurm data > > The seff output is mostly self-explanatory: > > > seff 3485050 > Job ID: 3485050 > Cluster: della > User/Group: dmcr/cses > State: COMPLETED (exit code 0) > Nodes: 2 > Cores per node: 2 > CPU Utilized: 00:00:01 > CPU Efficiency: 0.40% of 00:04:08 core-walltime > Memory Utilized: 2.04 GB (estimated maximum) > Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node) > > The smail utility is invoked automatically to process end-of-job > notifications by adding the following to /etc/slurm/slurm.conf: > > MailProg=/usr/bin/smail > > This script parses the notification subject line, and generates the > requested email with seff output as the body. > > Please let me know if you have any questions. > > Best regards, > Dennis Hi Dennis, I am interested in your work. Which formulas had you used to reach your final conclusions of both efficiencies? Thanks in advance, Manuel.-
Hi Manuel, I don’t recall exactly since this was written several years ago now, and I don’t have the code with me at the moment. However, I believe that CPU efficiency was basically the actual CPU time divided by the number of CPUs divided by the wallclock time. Memory efficiency was the high-water mark of memory used divided by the memory requested for the job. The code is not very complicated, so if you download it and take a look, you can probably see exactly what I did. Best, Dennis From: bugs@schedmd.com [mailto:bugs@schedmd.com] Sent: Monday, July 2, 2018 1:21 AM To: dmcr@princeton.edu Subject: [Bug 1611] New feature: Slurm efficiency script: seff <mailto:mcasillasrcc@gmail.com> Manu changed bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611> What Removed Added CC mcasillasrcc@gmail.com <mailto:mcasillasrcc@gmail.com> Comment # 10 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c10> on bug 1611 <https://bugs.schedmd.com/show_bug.cgi?id=1611> from <mailto:mcasillasrcc@gmail.com> Manu (In reply to Dennis McRitchie from comment #0 <https://bugs.schedmd.com/show_bug.cgi?id=1611#c0> ) > Created attachment 1830 [details] <https://bugs.schedmd.com/attachment.cgi?id=1830> [details] <https://bugs.schedmd.com/attachment.cgi?id=1830&action=edit> > Tarball for seff and smail utilities > > This is a contribution: the Slurm job efficiency report (seff). > > Summary: > seff takes a jobid and reports on the efficiency of that job's cpu and > memory utilization. The rpm/tarball comes with an 'smail' utility that > allows for Slurm end-of-job emails to include a seff report. This allows > users to become aware if they are wasting resources. > > > seff > Usage: seff [Options] <Jobid> > Options: > -h Help menu > -v Version > -d Debug mode: display raw Slurm data > > The seff output is mostly self-explanatory: > > > seff 3485050 > Job ID: 3485050 > Cluster: della > User/Group: dmcr/cses > State: COMPLETED (exit code 0) > Nodes: 2 > Cores per node: 2 > CPU Utilized: 00:00:01 > CPU Efficiency: 0.40% of 00:04:08 core-walltime > Memory Utilized: 2.04 GB (estimated maximum) > Memory Efficiency: 86.89% of 2.34 GB (1.17 GB/node) > > The smail utility is invoked automatically to process end-of-job > notifications by adding the following to /etc/slurm/slurm.conf: > > MailProg=/usr/bin/smail > > This script parses the notification subject line, and generates the > requested email with seff output as the body. > > Please let me know if you have any questions. > > Best regards, > Dennis Hi Dennis, I am interested in your work. Which formulas had you used to reach your final conclusions of both efficiencies? Thanks in advance, Manuel.- _____ You are receiving this mail because: * You reported the bug. * You are on the CC list for the bug.