Ticket 3841 - Requested fields for slurmdb
Summary: Requested fields for slurmdb
Status: OPEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 17.02.2
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Unassigned Developer
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-05-26 12:31 MDT by HMS Research Computing
Modified: 2019-04-09 11:33 MDT (History)
1 user (show)

See Also:
Site: Harvard Medical School
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description HMS Research Computing 2017-05-26 12:31:29 MDT
Hello,

We'd like to request some additional fields be added to slurmdb for a more complete accounting for each job. These field names are taken verbatim from scontrol show job or squeue, and we'd like them to show up in sacct, for example (or make them accessible via API, etc.). Please let us know if something is already present in some capacity, as we've taken our best guess at what the fields do when considering fields we'd like added. Please let us know if these are feasible, etc.

Command
Reason
Dependency
Runtime
CpusPerTask
Workdir
Priority*
ARRAY_JOB_ID
ARRAY_TASK_ID
TIME_LEFT

*Regarding the priority field, we noticed that it shows up twice in squeue; could we get some clarification on what the difference is between them? For instance, is %p just a somehow normalized value of %Q?

Additionally, we have a few fields we'd like a bit more information on; if there's any documentation you can point us to (whether we've seen it or not) or some extra insight you might be able to provide, that would be greatly appreciated, and would potentially result in our desire to add one or more of these fields to the above list, for those that aren't already present in sacct. We've looked at their entries on the man pages, but would appreciate any additional clarification. Here are those fields and some notes that we took internally on what we would like to know:

MaxRSS (actually already in sacct) - this is actually total memory usage?
MCS_Label - implementation details? We found the man page, but anything else would be appreciated.
BatchHost - what is this?
Contiguous - what is this? Memory-related?
MinMemoryNode - implementation details (e.g. compared to MinMemoryCPU) Is it minimum available, or minimum total?
SuspendTime - what happens if a second/multiple suspension occurs? does it overwrite the field entry?
SecsPreSuspend - see SuspendTime (also, is there a field that tracks how long a job is suspended for)

Thanks!