Ticket 17337

Summary: Cannot report on pending time in queue
Product: Slurm Reporter: Chris Holder <christopher.holder>
Component: AccountingAssignee: Benjamin Witham <benjamin.witham>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: --- CC: benjamin.witham
Version: - Unsupported Older Versions   
Hardware: Linux   
OS: Linux   
Site: Baylor College of Medicine Molecular and Human Genetics Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Chris Holder 2023-08-02 10:03:22 MDT
I am trying to get some meaningful reporting on usage and pending time in queue with no results.  None of the commands in the documentation have been able to show me an average of time spent in the pending status for jobs.
Comment 2 Benjamin Witham 2023-08-02 14:26:43 MDT
Hello Chris, 

The squeue command allows for users to build their own squeue with the -O or --Format= feature. There are a few different options for times. 
> https://slurm.schedmd.com/squeue.html#OPT_Format

It think that this is the one you're looking for, but I have a few other options as well.
PendingTime - Shows the time in seconds that a job has been waiting in the queue. If the job has started, it prints the difference between the start time and submission time of the job.
> https://slurm.schedmd.com/squeue.html#OPT_PendingTime

StartTime - This is the projected time that the job will start if the job is pending, or the time that the job started if the job is running. 
> https://slurm.schedmd.com/squeue.html#OPT_StartTime

EndTime - This is the projected time that the job will end. This is found based on the time limits of the submitted jobs.
> https://slurm.schedmd.com/squeue.html#OPT_EndTime

The --start option of squeue may also be a good starting point to look at. This will only print pending jobs and their expected start time.
> https://slurm.schedmd.com/squeue.html#OPT_start

Does this answer your question?
Comment 3 Chris Holder 2023-08-02 14:31:27 MDT
Forgive my ignorance.  I have been working on the assumption that squeue would only show "live" jobs and once a job has completed it would be up to the accounting database to give out historical information.

I am trying to build and ongoing report for various job statistics (by account) so that I can provide those to my users as well as assess the ongoing trends in the cluster's performance.
Comment 4 Benjamin Witham 2023-08-02 14:37:52 MDT
Hello Chris, 

I believe I misunderstood your original question, I did not realize you were looking to get data on completed jobs. You are correct, squeue is only for jobs that are pending and running, and once completed the information about that job can be retrieved with sacct. 

Allow me to look there for a moment.
Comment 5 Benjamin Witham 2023-08-02 15:12:57 MDT
Hello Chris, 

I have not found anything that will display the amount of time that a completed job was pending in the queue. The best option is to get the SubmitTime and StartTime from sacct and take the difference between the two. I would suggest using the -p or -P options to help with parsing the data. 

Have you looked into sreport as well? There are tools there that could help you create your report.

> https://slurm.schedmd.com/sreport.html
Comment 6 Chris Holder 2023-08-02 15:36:27 MDT
sreport is the BANE OF MY EXISTENCE!!  I can't get that stupid thing to be even remotely useful or consistent.  I have no doubt in my mind that I am just doing it wrong, but I have yet to be able to find a tutorial that doesn't read like a doctorate level engineering thesis.  I mean seriously...  It's impossible to find documentation that is consumable by normal humans of only slightly above-average intelligence.  I think the last time I jumped into sreport it was spitting out reporting based on POSIX groups instead of the actual slurm account association.  Also, when I ran it from the CLI it spit out data but when I added it to a cron job it's nothing but headers with empty data tables.

If you have some human-readable documentation or recommendations, I would really appreciate it.
Comment 7 Benjamin Witham 2023-08-16 11:56:47 MDT
Hello Chris, I apologize for the late response. 

I agree that the sreport documentation is confusing. Which parts of it are you finding most difficult. 

What command were you running in your cron job and how often was your cron job running?
Comment 8 Chris Holder 2023-08-21 12:52:27 MDT
Sorry for the delay.  Here’s my crontab entry:

0 0 1 * * /var/opt/Slurm_tools/slurmreportmonth/slurmreportmonth -m

When run from crontab the data tables are empty.  When run from a CLI it populates data.

Thanks,
Chris

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Wednesday, August 16, 2023 12:57 PM
To: Holder, Christopher Michael <Christopher.Holder@bcm.edu>
Subject: [Bug 17337] Cannot report on pending time in queue

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
________________________________
Comment # 7<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337-23c7&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=1-1RYAkNI2AMj4KOWqpiORuwYBgedChpr9inSMFL4C4nt5py9XSkdBPW2YvDrrVB&s=EVBD0CLcz9vHAi5XdCPuF8EY8gId_D1qMOnSQE9YQnw&e=> on bug 17337<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=1-1RYAkNI2AMj4KOWqpiORuwYBgedChpr9inSMFL4C4nt5py9XSkdBPW2YvDrrVB&s=844eCBdvUEwZi9VrPwO8q7XyJkcUriNM1aQlrv7jNQw&e=> from Benjamin Witham<mailto:benjamin.witham@schedmd.com>

Hello Chris, I apologize for the late response.



I agree that the sreport documentation is confusing. Which parts of it are you

finding most difficult.



What command were you running in your cron job and how often was your cron job

running?

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 9 Benjamin Witham 2023-08-22 11:35:01 MDT
Hello Chris, 

Are you able to send me the exact sreport command that is run in your slurmreportmonth file?
Comment 10 Chris Holder 2023-08-23 12:33:11 MDT
sreport cluster utilization Start=$START End=$END -t percent > $REPORT

sreport -t hourper --tres=cpu,gpu cluster AccountUtilizationByUser  Start=$START End=$END format=Accounts,Cluster,Login,Proper%30,TresName,Used tree >> $REPORT

Thanks,
Chris

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Tuesday, August 22, 2023 12:35 PM
To: Holder, Christopher Michael <Christopher.Holder@bcm.edu>
Subject: [Bug 17337] Cannot report on pending time in queue

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
________________________________
Comment # 9<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337-23c9&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=XQDEce2kMgpBXVotDySyIVPk-fnEyPTmR1iMLzFHgVoMTdix2myhA_B7vlDUrP8B&s=M-_iDk5NakYt9AntzZqxb9qVsnC8L7rel_hLigrtSgI&e=> on bug 17337<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=XQDEce2kMgpBXVotDySyIVPk-fnEyPTmR1iMLzFHgVoMTdix2myhA_B7vlDUrP8B&s=3IEcEytNx4vsnUzhC-XfrWFbL64xEKN9kHTETOODVpc&e=> from Benjamin Witham<mailto:benjamin.witham@schedmd.com>

Hello Chris,



Are you able to send me the exact sreport command that is run in your

slurmreportmonth file?

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 11 Benjamin Witham 2023-09-04 16:10:20 MDT
Hello Chris, 

Where are you getting your $START and $END times from? The only way I'm able to (somewhat) reproduce the tables with no bodies is if the times that I set are bad (as have not happened yet). 

Is crontab feeding your sreport times for the next month and not the previous one?
Comment 12 Chris Holder 2023-09-05 07:34:16 MDT
Yes, it is.  Let me take a look at that.

Thanks,
Chris

From: bugs@schedmd.com <bugs@schedmd.com>
Sent: Monday, September 4, 2023 5:10 PM
To: Holder, Christopher Michael <Christopher.Holder@bcm.edu>
Subject: [Bug 17337] Cannot report on pending time in queue

***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
________________________________
Comment # 11<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337-23c11&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=wliR6YQZJLkgQR98ngVkVn3OJ2M_Av8r3hIVi9Ju3gyTGS963gvaPg9Pj0VZywEh&s=Otyq2DjUPxAZa_ZSbKnSpqIm-_d7FEkcsi652qVpMWQ&e=> on bug 17337<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.schedmd.com_show-5Fbug.cgi-3Fid-3D17337&d=DwMFaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=2ZdeynACRgILMr62dx9xaTyPVxGWiPYfLvXORmnH2Vs&m=wliR6YQZJLkgQR98ngVkVn3OJ2M_Av8r3hIVi9Ju3gyTGS963gvaPg9Pj0VZywEh&s=wxQVdWetpcRXXPCkuQs5rycXsWnQTFf_iKd_Z7IH_qE&e=> from Benjamin Witham<mailto:benjamin.witham@schedmd.com>

Hello Chris,



Where are you getting your $START and $END times from? The only way I'm able to

(somewhat) reproduce the tables with no bodies is if the times that I set are

bad (as have not happened yet).



Is crontab feeding your sreport times for the next month and not the previous

one?

________________________________
You are receiving this mail because:

  *   You reported the bug.
Comment 13 Benjamin Witham 2023-09-14 09:32:39 MDT
Hey Chris, 

Are you still having trouble with your crontab job?
Comment 14 Benjamin Witham 2023-09-21 17:00:05 MDT
Hello Chris, 

I haven't heard from you, so I'll close this ticket now. If you're still having trouble with sreport, feel free to reopen this ticket.