Hi, Trying to get all jobs from the dbd for injection into XDMoD. Ran into the following error: sacct --clusters doduo --allusers --parsable2 --noheader --allocations --duplicates --format jobid,jobidraw,cluster,partition,qos,account,group,gid,user,uid,submit, eligible,start,end,elapsed,exitcode,state,nnodes,ncpus,reqcpus,reqmem,reqtres,alloctres,timelimit,nodelist,jobname --state CANCELLED,COMPLETED,FAILED,NODE_FAIL,PREEMPTED,TIMEOUT,OUT_OF_MEMORY,REQUEUED --starttime 2024-01-25T00:00:00 --endtime 2024-09-28T23:59:59 sacct: error: Getting response to message type: DBD_GET_JOBS_COND sacct: error: DBD_GET_JOBS_COND failure: Unspecified error Any suggestions on how to proceed? DBD was updated to 24.05.x last November. Another ticket stated to reduce log level, but we're at: [root@masterdb01 ~]# cat /etc/slurm/slurmdbd.conf ArchiveEvents=yes ArchiveJobs=yes ArchiveResvs=yes ArchiveSteps=no ArchiveSuspend=no ArchiveTXN=no ArchiveUsage=no AuthInfo=socket=/run/munge/munge.socket.2 AuthType=auth/munge DbdHost=masterdb01.gastly.os DebugLevel=info LogFile=/var/log/slurmdbd.log PidFile=/var/run/slurmdbd/slurmdbd.pid PrivateData=users,accounts,jobs,usage,events PurgeEventAfter=720hours PurgeJobAfter=25920hours PurgeResvAfter=25920hours PurgeStepAfter=720hours PurgeSuspendAfter=720hours PurgeTXNAfter=25920hours PurgeUsageAfter=25920hours SlurmUser=slurm StoragePass=<snip> StorageType=accounting_storage/mysql StorageUser=slurm Thanks, -- Andy
Hi Andy, To debug this further, could you please: 1. Enable detailed logging using DebugFlags=DB_QUERY,DB_JOB in slurmdbd.conf 2. Run the problematic sacct command once again 3. While it is hanging, run mysql> SHOW processlist; to inspect database activity. 4. Submit the slurmdbd log that captures the sacct duration and reset DebugFlags to prevent extra log collection. Does this behavior occur when running on shorter time intervals specified by starttime and endtime? Additionally, could you check the output of "sacctmgr show runawayjobs". This will list any completed jobs that are missing an end time in the database. Best, Patrick