Ticket 16947

Summary: MaxRSS and other values missing from the dbd
Product: Slurm Reporter: hpc-admin
Component: slurmdbdAssignee: Albert Gil <albert.gil>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 22.05.9   
Hardware: Linux   
OS: Linux   
Site: Ghent Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---
Attachments: slurm config

Description hpc-admin 2023-06-13 05:46:06 MDT
Created attachment 30743 [details]
slurm config

Hi,

We have 

JobAcctGatherFrequency=task=30
JobAcctGatherType=jobacct_gather/cgroup

set in our config, but if I request e.g., MaxRSS, it is empty (or 0):

[root@masterdb01 ~]# sacct --clusters all -j 16972183 -o MaxRSS
    MaxRSS
----------

         0
         0


I am likely missing something, so I'm not sure what. Could you provide some pointers?

Kind regards,
-- Andy
Comment 1 Albert Gil 2023-06-14 01:07:43 MDT
Hi Andy,

Your config seems correct.
Are you getting MaxRSS=0 in all jobs, or only on this one?
Could you attach the output of this command:

$ sacct -p --clusters all -j 16972183 -o JobID,Start,Elapsed,State,MaxRSS,AveRSS,ReqTRES,AllocTRES,NodeList

Also, could you attach the slurmctld logs of the Start date?
And the slurmd logs of the nodelist in the same date?

Thanks,
Albert
Comment 2 hpc-admin 2023-06-14 06:51:38 MDT
Hi Albert,

The output:

JobID|Start|Elapsed|State|MaxRSS|AveRSS|ReqTRES|AllocTRES|NodeList|
16972183|2023-06-13T12:43:36|00:00:06|COMPLETED|||billing=2,cpu=1,mem=6G,node=1|billing=2,cpu=1,mem=6G,node=1|node3212.victini.os|
16972183.batch|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||cpu=1,mem=6G,node=1|node3212.victini.os|
16972183.extern|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||billing=2,cpu=1,mem=6G,node=1|node3212.victini.os|


I am going to try and find the logs :)

This is a random job I picked, but I had lots. I guess I found the issue when comparing what your command gives for other jobs. I should not use --allocations, but rather grep on .batch.


Problem solved, thank you.

-- Andy