6406 – exact TRES allocations/affinities are not stored anywhere

Ticket 6406 - exact TRES allocations/affinities are not stored anywhere

Summary: exact TRES allocations/affinities are not stored anywhere

Status:	RESOLVED DUPLICATE of ticket 2047

Alias:	None

Product:	Slurm
Classification:	Unclassified
Component:	Accounting (show other tickets)
Version:	19.05.x
Hardware:	Linux Linux

Severity:	5 - Enhancement
Assignee:	Unassigned Developer
QA Contact:

URL:

Depends on:
Blocks:

Reported:	2019-01-25 15:30 MST by Sergey Meirovich
Modified:	2019-03-04 10:13 MST (History)
CC List:	0 users

See Also:
Site:	AMAT
Slinky Site:	---
Alineos Sites:	---
Atos/Eviden Sites:	---
Confidential Site:	---
Coreweave sites:	---
Cray Sites:	---
DS9 clusters:	---
Google sites:	---
HPCnow Sites:	---
HPE Sites:	---
IBM Sites:	---
NOAA SIte:	---
NoveTech Sites:	---
Nvidia HWinf-CS Sites:	---
OCF Sites:	---
Recursion Pharma Sites:	---
SFW Sites:	---
SNIC sites:	---
Tzag Elita Sites:	---
Linux Distro:	---
Machine Name:
CLE Version:
Version Fixed:
Target Release:	---
DevPrio:	---
Emory-Cloud Sites:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this ticket.

Description Sergey Meirovich 2019-01-25 15:30:49 MST

Hello,

For a running job we could see what exactly CPUs/GPUs/etc is allocated. E.g.
[root@DCALPH000 ~]# scontrol show -dd jobid=1360317 | grep IDs
     Nodes=dcalph134 CPU_IDs=24 Mem=0 GRES_IDX=gpu:p100(IDX:0)
[root@DCALPH000 ~]# 


We believe that  access to the same information will be useful for already completed jobs.

1) That will allow to reconstruct what really happened on cluster at any given time (time interval).
1.1) For example. If we'd like to research preemptor / preemptee relation. Let me explain: if we see overlap of the start,end pair for some job (having different PriorityTier)  and see that these jobs did share some nodes - there are still no sufficient evidences to conclude about preemption as it is not clear if there was any overlap in terms of e.g. CPU cores allocated on each of the nodes in question.
1.2) If we trying to calculate core seconds usage for each user per interval (say for each 10 minutes) we need to know exactly whether each jobs was running or suspended (preempted) inside that interval. And if job was suspended - then for how long it was preempted within this particular interval

2) For GPU we have already setup server level DCGM/prometheus/graphana dashboard. And we see now historical metrics for each GPU of any server. But job centric presentation as still impossible n as Slurm does not save allocated GPU IDs.

Of course we could still look into our own custom completion plugin(s). But we'd like to avoid any potential duplication of efforts. So even if SchedMD is not planning to look into this in a foreseeable future it would be at least interesting to understand where: slurm accounting db?/elastic search completion plugin?/somewhere else it seems logical to SchedMD to store information about exact CPU/GPU assignments.

Comment 2 Jason Booth 2019-01-28 13:34:02 MST

Hi Sergey Meirovich,

 This is an interesting idea but at this time there are no current plans to tackle these changes.

Comment 4 Sergey Meirovich 2019-01-28 15:04:27 MST

Hello Jason,

Thanks for you answer.

Could you please look into second part of my question? 

"... So even if SchedMD is not planning to look into this in a foreseeable future it would be at least interesting to understand where: slurm accounting db?/elastic search completion plugin?/somewhere else it seems logical to SchedMD to store information about exact CPU/GPU assignments."

?

Comment 5 Jason Booth 2019-01-29 16:59:41 MST

Hi Sergey Meirovich,

> "... So even if SchedMD is not planning to look into this in a foreseeable future it would be at least interesting to understand where: slurm accounting db?/elastic search completion plugin?/somewhere else it seems logical to SchedMD to store information about exact CPU/GPU assignments."


Slurm does not currently document task placement / gpu placement in the accounting database. It does give an overview of what was used.

e.g.

jason@nh-grey:~/slurm/master$ sacct -j 280 -o JobID,AllocGRES,AllocCPUS
       JobID    AllocGRES  AllocCPUS
------------ ------------ ----------
280                 gpu:0          2
280.batch           gpu:0          2
280.extern          gpu:0          2


Note that you can run "scontrol show job -d <job_id>", and query some more information in that output..

...
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
     Nodes=m1 CPU_IDs=0-1 Mem=0 GRES=gpu(IDX:0)
...

An epilogctld, "EpilogSlurmctld, may be able to capture this job's comment after the job is done.


https://slurm.schedmd.com/prolog_epilog.html

Comment 6 Jason Booth 2019-03-04 10:13:00 MST

Rolling this into one ticket.

*** This ticket has been marked as a duplicate of ticket 2047 ***