Ticket 4633

Summary: Add more data to job_table
Product: Slurm Reporter: Nicholas Cardo <cardo>
Component: AccountingAssignee: Unassigned Developer <dev-unassigned>
Status: OPEN --- QA Contact:
Severity: 5 - Enhancement    
Priority: --- CC: bsantos, fullop, maxime.martinasso, miguel.gila, sts
Version: 17.02.9   
Hardware: Linux   
OS: Linux   
Site: CSCS - Swiss National Supercomputing Centre Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed:
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---

Description Nicholas Cardo 2018-01-16 23:27:01 MST
We would like to be able to fully understand a job from the accounting data.  In this manner we can really gain an understanding of the influencing factors for job scheduling.  In order to be able to do this, some minor enhancements to the data stored in the accounting tables is required.  This is a request to add the following data to the accounting data:

1. Job dependencies
2. Licenses
3. Job Topology (distribution, switches, contiguous, spread_job, geometry, ...)
4. Flags (overcommit, oversubscribe, ...)
5. Nodefile or nodelist or slurm selected nodes (what did the user choose)

The objective is to be able to "replay" the accounting data to understand what may have caused a delayed start of the job.  The ability to differentiate whether the user requested specific resources or Slurm provided them is also an important factor.

Thanks
Comment 1 Nicholas Cardo 2018-01-16 23:28:06 MST
This is a request for enhancement.
Comment 2 Tim Wickberg 2018-01-16 23:59:42 MST
(In reply to Nicholas Cardo from comment #1)
> This is a request for enhancement.

We intentionally re-route everything through Sev4 so we can do some up-front triage.

(In reply to Nicholas Cardo from comment #0)
> We would like to be able to fully understand a job from the accounting data.
> In this manner we can really gain an understanding of the influencing
> factors for job scheduling.  In order to be able to do this, some minor
> enhancements to the data stored in the accounting tables is required.  This
> is a request to add the following data to the accounting data:
> 
> 1. Job dependencies
> 2. Licenses
> 3. Job Topology (distribution, switches, contiguous, spread_job, geometry,
> ...)
> 4. Flags (overcommit, oversubscribe, ...)
> 5. Nodefile or nodelist or slurm selected nodes (what did the user choose)

NodeList is already present. Although I gather you want to track if the user specifically set -w / -x ?
Comment 3 Nicholas Cardo 2018-01-17 00:17:48 MST
> Although I gather you want to track if the user specifically set -w / -x ?

Correct, did the user request specific nodes or did Slurm choose them.

Thanks