This is just an informational ticket. Is there a field one can use in sacct to see whether or not an exclusive job was run on a host. I know one can use sacct to examine jobs run against host(s) with the -N flag. Thanks, Shawn
Hi Shawn, I've looked into what's recorded when you use the --exclusive flag and there isn't a field that shows this particular flag being specified for a job. The way to tell if it was requested is by looking at the number of Requested/Allocated CPUs. It sounds like this is what you said you are doing currently, or at least know about. For reference, you can look at the requested and allocated cpus by specifying the format of the sacct output, like this: sacct -j 48845 -o jobid,jobname,partition,account,reqcpus,alloccpus,state,exitcode You would have to know the number of CPUs configured for the node to verify that all of them were requested and it wasn't just a partial node request. There are similar requests open, to have more information about jobs saved in the database. You can see bug 7609 and bug 3010. As you can see in those bugs a possible way to work around this in the mean time is to create a submit filter that looks for the --exclusive flag and adds an admin_comment to the job, which is recorded in the database and can be referenced later. This isn't a very nice workaround and can get messy fast if you add comments for multiple things. This is probably not what you were hoping for, but I hope it helps. Feel free to let me know if you have additional questions about this. Thanks, Ben
Hi Ben, Thanks for the reply. Yes we were aware of the sacct approach with reqcpus and alloccpus, and we also have the submit_filter logging the use of exclusive. Agreed, we can see how the latter gets messy, and for the former we'll just need to write a case statement to handle the 5 CPU host-type variants at ESRL. For this case, is there a way to monitor when/if a host leaves the "mixed" state. Our use case for this request is interested in health-check hosts that stay in "mixed" as a shared resource for a prolonged period of time. We avoid the running NHC via cron on hosts purposefully. Thanks, Shawn On Mon, Dec 23, 2019 at 11:09 AM <bugs@schedmd.com> wrote: > *Comment # 1 <https://bugs.schedmd.com/show_bug.cgi?id=8262#c1> on bug > 8262 <https://bugs.schedmd.com/show_bug.cgi?id=8262> from Ben Roberts > <ben@schedmd.com> * > > Hi Shawn, > > I've looked into what's recorded when you use the --exclusive flag and there > isn't a field that shows this particular flag being specified for a job. The > way to tell if it was requested is by looking at the number of > Requested/Allocated CPUs. It sounds like this is what you said you are doing > currently, or at least know about. For reference, you can look at the > requested and allocated cpus by specifying the format of the sacct output, like > this: > sacct -j 48845 -o > jobid,jobname,partition,account,reqcpus,alloccpus,state,exitcode > > You would have to know the number of CPUs configured for the node to verify > that all of them were requested and it wasn't just a partial node request. > > There are similar requests open, to have more information about jobs saved in > the database. You can see bug 7609 <https://bugs.schedmd.com/show_bug.cgi?id=7609> and bug 3010 <https://bugs.schedmd.com/show_bug.cgi?id=3010>. As you can see in those bugs > a possible way to work around this in the mean time is to create a submit > filter that looks for the --exclusive flag and adds an admin_comment to the > job, which is recorded in the database and can be referenced later. This isn't > a very nice workaround and can get messy fast if you add comments for multiple > things. > > This is probably not what you were hoping for, but I hope it helps. Feel free > to let me know if you have additional questions about this. > > Thanks, > Ben > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
Hi Shawn, I think I understand what you're trying to get from this. Unfortunately there isn't a state for a node that monitors whether it has a single job or multiple. I tried to come up with some way of doing this, but currently nodes with multiple jobs that are fully allocated look the same as a node with a single job that occupies the whole node. To be able to tell if there's an exclusive job it would require piecing together information from the jobs on the system. Having this data tracked would be an enhancement from the current functionality. There is a request open in bug 4161 that aligns with what you are asking. If a change like this is made we would want to do it in a way that makes the information available to both slurmctld, through a command like scontrol, as well as to slurmdbd, with a command like sacct. I'm going to mark this bug as a duplicate of bug 4161, but if you have any additional questions about this feel free to let me know. Thanks, Ben *** This ticket has been marked as a duplicate of ticket 4161 ***