We are attempting to shift our accounting to use the jobcomp/kafka plugin. On the whole it's working great, but there is an issue with the partition column. Our researchers often submit jobs to multiple partitions using comma-separated lists like this: --partition=day,scavenge This lets them attempt to submit to the "standard" partitions and fall-back to our scavenge partition (which runs in preempt-mode) if there are no available resources in `day`. It's also used to submit to condo partitions with a fall-back of commons if the condo nodes are full. While sacct reports the correct partition that the job ran in, jobcomp/kafka includes the full comma-separated list with no indication of which partition the job actually ran in. Since we treat usage in these various partition types (condo, commons, scavenge) differently in our accounting this is a show-stopper for switching over to jobcomp/kafka. Let me know if I can provide any additional info to help y'all reproduce this issue. Thanks! Tom Langford
Hi everyone, it's been a week since I submitted this ticket and I haven't heard anything back yet. Is there any additional information I could provide to help out? These are my JobComp config settings: rdkafka.conf -------------- bootstrap.servers=XXX.XXX.XXX.XXX:9092 debug=broker,topic,msg linger.ms=400 log_level=7 slurm.conf -------------- JobCompType=jobcomp/kafka JobCompLoc=/opt/slurm/current/etc/rdkafka.conf JobCompParams=flush_timeout=200,poll_interval=3,requeue_on_msg_timeout,topic=slurm_accounting Happy to provide anything else that's useful.
Hi, Sorry I didn't get back to this earlier. Historically all jobcomp plugins (including kafka and elasticsearch sharing a common serialization code path) sent the partition field off of the job_record_t->partition field as-is, which is defined: char *partition; /* name of job partition(s) */ Making a change to suit your expectations would be a change in behavior potentially disrupting other sites expectations. We can discuss this internally and come back to you.
Thanks for the clarification. I'm surprised that there isn't a field for "partition where this job ran". That's what I expected the "partition" value to be, perhaps there could be a separation of "submitted partition list" from "partition"? The corresponding field from sacct gets updated to be the "partition where the job ran", hence my expectation that this would be the same information. Do other sites not use the JobComp plugins for long-term accounting? I don't really see the value of the "submitted partition", since that list could contain all partitions. We really care about the breakdown of jobs that ran in privately owned partitions vs commons partitions, as that factors into how we report usage to our various research departments. Is there another field that I'm missing in the JobComp datastream that indicates which partition a job actually ran in? Thanks so much, -t
Hi Thomas, Just to give you an update, after internal discussion we'll work on changing the "partition" field in jobcomp plugins for 25.05 to reflect the partition the job ran on, instead of the current multi-part submission comma separated string list. We'll make sure to communicate the change at release time for sites used to previous behavior. I'll come back to you when it's ready. Thanks.
Fantastic, thanks! Looking forward to the implementation. I've been really happy with the Kafka jobcomp plugin, thanks for all the hard work!