We would like to be able to fully understand a job from the accounting data. In this manner we can really gain an understanding of the influencing factors for job scheduling. In order to be able to do this, some minor enhancements to the data stored in the accounting tables is required. This is a request to add the following data to the accounting data: 1. Job dependencies 2. Licenses 3. Job Topology (distribution, switches, contiguous, spread_job, geometry, ...) 4. Flags (overcommit, oversubscribe, ...) 5. Nodefile or nodelist or slurm selected nodes (what did the user choose) The objective is to be able to "replay" the accounting data to understand what may have caused a delayed start of the job. The ability to differentiate whether the user requested specific resources or Slurm provided them is also an important factor. Thanks
This is a request for enhancement.
(In reply to Nicholas Cardo from comment #1) > This is a request for enhancement. We intentionally re-route everything through Sev4 so we can do some up-front triage. (In reply to Nicholas Cardo from comment #0) > We would like to be able to fully understand a job from the accounting data. > In this manner we can really gain an understanding of the influencing > factors for job scheduling. In order to be able to do this, some minor > enhancements to the data stored in the accounting tables is required. This > is a request to add the following data to the accounting data: > > 1. Job dependencies > 2. Licenses > 3. Job Topology (distribution, switches, contiguous, spread_job, geometry, > ...) > 4. Flags (overcommit, oversubscribe, ...) > 5. Nodefile or nodelist or slurm selected nodes (what did the user choose) NodeList is already present. Although I gather you want to track if the user specifically set -w / -x ?
> Although I gather you want to track if the user specifically set -w / -x ? Correct, did the user request specific nodes or did Slurm choose them. Thanks