I'm really not sure if this is a misunderstanding on our part or not, but we have a node where we run the following command, and it gives us output that we didn't expect. Basically we're trying to get a list of all the currently-running jobs on that node, and it shows us a job that's been cancelled, and is no longer running. Here's the specific command and output: # sacct -N m7-2-5 -s R JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 80777 base-042-+ m7,m6 sgorrell 96 TIMEOUT 0:1 80777.batch batch sgorrell 1 CANCELLED 0:15 As I said, if we're misunderstanding, and "-s R" DOESN'T mean "show running jobs", then that's fine. It just wasn't what we expected. I'm not sure how to go about diagnosing this. If you need me to run some commands to get more info to you, I'd be happy to do that. I just don't know what other info to give you. Lloyd Brown Fulton Supercomputing Lab Brigham Young University
Lloyd, I think I see what is happening. By default we fill in a start time of midnight for the current day. What I think is happening is job 80777 was running at midnight on this node. So it is showing up on the query. I'll see about removing the default start time when asking for states. In the meantime you can just add -Snow to your sacct line. It is probably safer to do that anyway so you are clear what you are asking for. Let me know if that fixes your issue.
Ah. A misunderstanding on my part. For our purposes, adding that parameter is an easy fix, so if you don't want to change the default behavior, that's fine with me. Thanks for the explanation.
I still think this is misleading, or at best confusing. I just changed it to default to now when asking for states. https://github.com/SchedMD/slurm/commit/af29d9a7a759958a9c5653df374593577cd430d3 I also cleared up the documentation here https://github.com/SchedMD/slurm/commit/5d2181f4b28c783e2d88f9d5d9236f2663f8c2ec Both these patches will be in the next 2.5 release.