Ticket 852

Summary: sstat command can not use
Product: Slurm Reporter: toru matsuoka <tmatsuoka>
Component: OtherAssignee: David Bigagli <david>
Status: RESOLVED DUPLICATE QA Contact:
Severity: 3 - Medium Impact    
Priority: --- CC: da
Version: 2.6.2   
Hardware: Linux   
OS: Linux   
Site: CRAY Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: slurm.conf file

Description toru matsuoka 2014-06-02 20:29:26 MDT
Created attachment 893 [details]
slurm.conf file

Hello,Slurm Support team !

I'm Toru Matsuoka in Cray Japan Engineer.

Please teach me about following contents.

We customer want use sstat commands. 

But , following error occured.

■sstat command

Note: the sstat  command requires that the jobacct_gather plugin be installed and operational.

[root@mgmt2 slurm]# sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 27005
    AveCPU   AvePages     AveRSS  AveVMSize        JobID
---------- ---------- ---------- ---------- ------------
sstat: error: Malformed RPC of type 5020 received
sstat: error: slurm_receive_msgs: Header lengths are longer than data received
sstat: error: Malformed RPC of type 5020 received
sstat: error: slurm_receive_msgs: Header lengths are longer than data received
sstat: error: Malformed RPC of type 5020 received
sstat: error: slurm_receive_msgs: Header lengths are longer than data received
sstat: error: slurm_job_step_stat: unknown return given from e035: 9001 rc = Communication connection failure
sstat: error: slurm_job_step_stat: unknown return given from e036: 9001 rc = Communication connection failure
sstat: error: slurm_job_step_stat: unknown return given from e034: 9001 rc = Communication connection failure
sstat: error: problem getting step_layout for 27005.0: Communication connection failure

■sacct command 

It look likes use sacct command.

[root@mgmt2 slurm]# sacct --j 27061
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
27061             prog4 ye016uta7+       root         40    RUNNING      0:0
27061.0       pmi_proxy                  root          2    RUNNING      0:0


In Slurm.conf , 

ProctrackType=proctrack/pgid
JobAcctGatherType parameter is not exist.

Is it necessary JobAcctGatherType in slurm.conf or cause of other problem?

Best Regards...
Toru Matsuoka
Comment 1 Moe Jette 2014-06-03 02:36:52 MDT
Duplicate of bug 853

*** This ticket has been marked as a duplicate of ticket 853 ***