4633 – Add more data to job_table

Ticket 4633 - Add more data to job_table

Summary: Add more data to job_table

Status:	OPEN

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Accounting (show other tickets)
Version:	17.02.9
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	Unassigned Developer
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2018-01-16 23:27 MST by Nicholas Cardo
Modified:	2019-04-09 11:39 MDT (History)
CC List:	5 users (show)

See Also:
Site:	CSCS - Swiss National Supercomputing Centre
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Nicholas Cardo 2018-01-16 23:27:01 MST

We would like to be able to fully understand a job from the accounting data.  In this manner we can really gain an understanding of the influencing factors for job scheduling.  In order to be able to do this, some minor enhancements to the data stored in the accounting tables is required.  This is a request to add the following data to the accounting data:

1. Job dependencies
2. Licenses
3. Job Topology (distribution, switches, contiguous, spread_job, geometry, ...)
4. Flags (overcommit, oversubscribe, ...)
5. Nodefile or nodelist or slurm selected nodes (what did the user choose)

The objective is to be able to "replay" the accounting data to understand what may have caused a delayed start of the job.  The ability to differentiate whether the user requested specific resources or Slurm provided them is also an important factor.

Thanks

Comment 1 Nicholas Cardo 2018-01-16 23:28:06 MST

This is a request for enhancement.

Comment 2 Tim Wickberg 2018-01-16 23:59:42 MST

(In reply to Nicholas Cardo from comment #1)
> This is a request for enhancement.

We intentionally re-route everything through Sev4 so we can do some up-front triage.

(In reply to Nicholas Cardo from comment #0)
> We would like to be able to fully understand a job from the accounting data.
> In this manner we can really gain an understanding of the influencing
> factors for job scheduling.  In order to be able to do this, some minor
> enhancements to the data stored in the accounting tables is required.  This
> is a request to add the following data to the accounting data:
> 
> 1. Job dependencies
> 2. Licenses
> 3. Job Topology (distribution, switches, contiguous, spread_job, geometry,
> ...)
> 4. Flags (overcommit, oversubscribe, ...)
> 5. Nodefile or nodelist or slurm selected nodes (what did the user choose)

NodeList is already present. Although I gather you want to track if the user specifically set -w / -x ?

Comment 3 Nicholas Cardo 2018-01-17 00:17:48 MST

> Although I gather you want to track if the user specifically set -w / -x ?

Correct, did the user request specific nodes or did Slurm choose them.

Thanks