Ticket 11069

Summary: Empty sstat
Product: Slurm Reporter: lhuang
Component: AccountingAssignee: Scott Hilton <scott>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.11.0   
Hardware: Linux   
OS: Linux   
Site: NY Genome Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm conf

Description lhuang 2021-03-12 09:28:18 MST
Created attachment 18396 [details]
slurm conf

I've attached our slurm.conf.

JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup

sstat is not reporting any data.

[lhuang@pe2-login01 ~]$ scontrol show jo 11398325
JobId=11398325 JobName=tensor_encode
   UserId=awidman(50507) GroupId=dllab(9012) MCS_label=N/A
   Priority=203056 Nice=0 Account=dllab QOS=dllab
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=6-12:50:17 TimeLimit=14-00:00:00 TimeMin=N/A
   SubmitTime=2021-03-05T20:39:18 EligibleTime=2021-03-05T20:39:18
   AccrueTime=2021-03-05T20:39:18
   StartTime=2021-03-05T22:37:36 EndTime=2021-03-19T23:37:36 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-03-05T22:37:36
   Partition=pe2 AllocNode:Sid=pe2-login01:100325
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=pe2cc2-012
   BatchHost=pe2cc2-012
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=70G,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=70G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/gpfs/commons/home/awidman/phoenix/encode/encode_hg38.sh --bam /gpfs/commons/home/awidman/liquid_biopsy/AD_melanoma/AD-05_A/final/AD-05_A.bam --vcf /gpfs/commons/home/awidman/liquid_biopsy/AD_melanoma/AD-05_A/pileup/AD-05_A_SNV_1000000.vcf --out /gpfs/commons/home/awidman/liquid_biopsy/AD_melanoma/AD-05_A/phoenix/AD-05_A_SNV_1000000_vaf_35.vcf --VAF-threshold 0.35
   WorkDir=/gpfs/commons/home/awidman/phoenix
   StdErr=/gpfs/commons/home/ckhamnei/logs/encode_hg38.11398325
   StdIn=/dev/null
   StdOut=/gpfs/commons/home/ckhamnei/logs/encode_hg38.11398325
   Power=
   NtasksPerTRES:0



[lhuang@pe2-login01 ~]$ sstat -j 11398325
       JobID  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU   NTasks AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask  AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask AveDiskWrite TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutAve TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsageOutMin TRESUsageOutMinNode TRESUsageOutMinTask TRESUsageOutTot 
------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- -------------- ------------ --------------- --------------- ------------ ------------ ---------------- ---------------- ------------ -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- --------------- ------------------- ------------------- --------------- ------------------- ------------------- ---------------
Comment 1 Scott Hilton 2021-03-12 16:28:38 MST
sstat gets data on job steps, not jobs in general. I believe that you are getting nothing because no steps were specified.

Please try adding the -a (--allsteps) option. I recommend using this option every time you want to get a view of the job in general. 

Here is the relevant documentation:
-j, --jobs
Format is <job(.step)>. Stat this job step or comma-separated list of job steps. This option is required. The step portion will default to the lowest numbered (not batch, extern, etc) step running if not specified, unless the --allsteps flag is set where not specifying a step will result in all running steps to be displayed. NOTE: A step id of 'batch' will display the information about the batch step. NOTE: A step id of 'extern' will display the information about the extern step. This step is only available when using PrologFlags=contain

-Scott
Comment 2 lhuang 2021-03-12 18:37:12 MST
Thanks, you're right. Please close the tkt.

On Sat, Mar 13, 2021 at 7:28 AM <bugs@schedmd.com> wrote:

> *Comment # 1
> <https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=11069*c1__;Iw!!C6sPl7C9qQ!B7I8MemBivsseWnj01R942lGCWUNv7Q6BAPtAP3Oym8PBg2Xb_ot3YsY5C6x2C4$>
> on bug 11069
> <https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=11069__;!!C6sPl7C9qQ!B7I8MemBivsseWnj01R942lGCWUNv7Q6BAPtAP3Oym8PBg2Xb_ot3YsYN0hYoKs$>
> from Scott Hilton <scott@schedmd.com> *
>
> sstat gets data on job steps, not jobs in general. I believe that you are
> getting nothing because no steps were specified.
>
> Please try adding the -a (--allsteps) option. I recommend using this option
> every time you want to get a view of the job in general.
>
> Here is the relevant documentation:
> -j, --jobs
> Format is <job(.step)>. Stat this job step or comma-separated list of job
> steps. This option is required. The step portion will default to the lowest
> numbered (not batch, extern, etc) step running if not specified, unless the
> --allsteps flag is set where not specifying a step will result in all running
> steps to be displayed. NOTE: A step id of 'batch' will display the information
> about the batch step. NOTE: A step id of 'extern' will display the information
> about the extern step. This step is only available when using
> PrologFlags=contain
>
> -Scott
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
> ------------------------------
> This message is for the recipient’s use only, and may contain
> confidential, privileged or protected information. Any unauthorized use or
> dissemination of this communication is prohibited. If you received this
> message in error, please immediately notify the sender and destroy all
> copies of this message. The recipient should check this email and any
> attachments for the presence of viruses, as we accept no liability for any
> damage caused by any virus transmitted by this email.
>
Comment 3 Scott Hilton 2021-03-15 09:32:22 MDT
Glad I could help. Closing ticket