Summary: | jobcomp not including energy | ||
---|---|---|---|
Product: | Slurm | Reporter: | Matt Ezell <ezellma> |
Component: | Accounting | Assignee: | Benjamin Witham <benjamin.witham> |
Status: | OPEN --- | QA Contact: | |
Severity: | 4 - Minor Issue | ||
Priority: | --- | CC: | benjamin.witham |
Version: | 23.11.7 | ||
Hardware: | Linux | ||
OS: | Linux | ||
Site: | ORNL-OLCF | Alineos Sites: | --- |
Atos/Eviden Sites: | --- | Confidential Site: | --- |
Coreweave sites: | --- | Cray Sites: | --- |
DS9 clusters: | --- | HPCnow Sites: | --- |
HPE Sites: | --- | IBM Sites: | --- |
NOAA SIte: | --- | NoveTech Sites: | --- |
Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
Recursion Pharma Sites: | --- | SFW Sites: | --- |
SNIC sites: | --- | Tzag Elita Sites: | --- |
Linux Distro: | --- | Machine Name: | |
CLE Version: | Version Fixed: | ||
Target Release: | --- | DevPrio: | --- |
Emory-Cloud Sites: | --- | ||
Attachments: | slurmd.conf |
Description
Matt Ezell
2024-05-23 18:52:31 MDT
Created attachment 36685 [details]
slurmd.conf
Hello Matt, I can reproduce this issue from my end. I'm looking into the cause of the issue currently. Just a few quick questions for you. Are you never seeing the energy be printed? Do only some jobs print energy? In your best estimate, what is the average length of one of your jobs? (In reply to Benjamin Witham from comment #2) > Are you never seeing the energy be printed? Do only some jobs print energy? > In your best estimate, what is the average length of one of your jobs? Reading from the Kakfa stream, I'm only ever seeing these 3 fields in tre_alloc: "cpu=3584,node=32,billing=3584" and tres_alloc_raw always has 3=18446744073709551614. This is a test system, so walltimes vary quite a bit. Some jobs are sub-minute (and I would understand if those didn't report energy if the plugin wasn't able to gather enough samples), but some are many hours (up to our default 12 hour walltime). sacct seems to show energy for all jobs, even ones that ran sub-minute. [root@slurm1.borg ~]# sacct -o jobid,alloctres%40,elapsed,start,end -X -S 2024-05-26 JobID AllocTRES Elapsed Start End ------------ ---------------------------------------- ---------- ------------------- ------------------- 129488+0 billing=56,cpu=112,energy=16952728,node+ 07:35:14 2024-05-25T20:47:24 2024-05-26T04:22:38 129488+1 billing=448,cpu=448,energy=69644742,nod+ 07:35:14 2024-05-25T20:47:24 2024-05-26T04:22:38 129497 billing=112,cpu=112,energy=3083790,node+ 00:23:19 2024-05-26T00:04:11 2024-05-26T00:27:30 129498 billing=112,cpu=112,energy=93030,node=1 00:00:44 2024-05-26T00:27:34 2024-05-26T00:28:18 Hello Matt, I apologize for the delayed response. I can reproduce this issue, and I'm looking into it currently. I'll keep you updated. |