| Summary: | MaxRSS and other values missing from the dbd | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | hpc-admin |
| Component: | slurmdbd | Assignee: | Albert Gil <albert.gil> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 22.05.9 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Ghent | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm config | ||
Hi Andy, Your config seems correct. Are you getting MaxRSS=0 in all jobs, or only on this one? Could you attach the output of this command: $ sacct -p --clusters all -j 16972183 -o JobID,Start,Elapsed,State,MaxRSS,AveRSS,ReqTRES,AllocTRES,NodeList Also, could you attach the slurmctld logs of the Start date? And the slurmd logs of the nodelist in the same date? Thanks, Albert Hi Albert, The output: JobID|Start|Elapsed|State|MaxRSS|AveRSS|ReqTRES|AllocTRES|NodeList| 16972183|2023-06-13T12:43:36|00:00:06|COMPLETED|||billing=2,cpu=1,mem=6G,node=1|billing=2,cpu=1,mem=6G,node=1|node3212.victini.os| 16972183.batch|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||cpu=1,mem=6G,node=1|node3212.victini.os| 16972183.extern|2023-06-13T12:43:36|00:00:06|COMPLETED|0|0||billing=2,cpu=1,mem=6G,node=1|node3212.victini.os| I am going to try and find the logs :) This is a random job I picked, but I had lots. I guess I found the issue when comparing what your command gives for other jobs. I should not use --allocations, but rather grep on .batch. Problem solved, thank you. -- Andy |
Created attachment 30743 [details] slurm config Hi, We have JobAcctGatherFrequency=task=30 JobAcctGatherType=jobacct_gather/cgroup set in our config, but if I request e.g., MaxRSS, it is empty (or 0): [root@masterdb01 ~]# sacct --clusters all -j 16972183 -o MaxRSS MaxRSS ---------- 0 0 I am likely missing something, so I'm not sure what. Could you provide some pointers? Kind regards, -- Andy