| Summary: | MaxRSS and other values missing from the dbd | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | hpc-ops |
| Component: | slurmdbd | Assignee: | Albert Gil <albert.gil> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.9 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Ghent | Slinky Site: | --- |
| Alineos Sites: | --- | Atos/Eviden Sites: | --- |
| Confidential Site: | --- | Coreweave sites: | --- |
| Cray Sites: | --- | DS9 clusters: | --- |
| Google sites: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | NoveTech Sites: | --- |
| Nvidia HWinf-CS Sites: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Tzag Elita Sites: | --- |
| Linux Distro: | --- | Machine Name: | |
| CLE Version: | Version Fixed: | ||
| Target Release: | --- | DevPrio: | --- |
| Emory-Cloud Sites: | --- | ||
| Attachments: | slurm config | ||
Hi Andy, Your config seems correct. Are you getting MaxRSS=0 in all jobs, or only on this one? Could you attach the output of this command: $ sacct -p --clusters all -j 16972183 -o JobID,Start,Elapsed,State,MaxRSS,AveRSS,ReqTRES,AllocTRES,NodeList Also, could you attach the slurmctld logs of the Start date? And the slurmd logs of the nodelist in the same date? Thanks, Albert Hi Albert, The output: JobID|Start|Elapsed|State|MaxRSS|AveRSS|ReqTRES|AllocTRES|NodeList| 16972183|2023-06-13T12:43:36|00:00:06|COMPLETED|||billing=2,cpu=1,mem=6G,node=1|billing=2,cpu=1,mem=6G,node=1|node3212.victini.os| 16972183.batch|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||cpu=1,mem=6G,node=1|node3212.victini.os| 16972183.extern|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||billing=2,cpu=1,mem=6G,node=1|node3212.victini.os| I am going to try and find the logs :) This is a random job I picked, but I had lots. I guess I found the issue when comparing what your command gives for other jobs. I should not use --allocations, but rather grep on .batch. Problem solved, thank you. -- Andy |
Created attachment 30743 [details] slurm config Hi, We have JobAcctGatherFrequency=task=30 JobAcctGatherType=jobacct_gather/cgroup set in our config, but if I request e.g., MaxRSS, it is empty (or 0): [root@masterdb01 ~]# sacct --clusters all -j 16972183 -o MaxRSS MaxRSS ---------- 0 0 I am likely missing something, so I'm not sure what. Could you provide some pointers? Kind regards, -- Andy