Ticket 2482 - Make it possible to track GRES TRES with and without a type
Summary: Make it possible to track GRES TRES with and without a type
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Accounting (show other tickets)
Version: 16.05.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Danny Auble
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2016-02-24 05:45 MST by Danny Auble
Modified: 2016-03-07 00:36 MST (History)
1 user (show)

See Also:
Site: Stanford
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 16.05.0-pre2
Target Release: 16.05
DevPrio: 4 - Medium
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Danny Auble 2016-02-24 05:45:04 MST
Related to bug 2475

From Kilian at Stanford

If we have a QOS defined like this ...

# sacctmgr list qos gpu  format=name,mintres%24
      Name                  MinTRES
---------- ------------------------
       gpu         cpu=1,gres/gpu=1

And we use GRES "types" for GPUs, in gres.conf:

NodeName=gpu-13-[1-2]   Name=gpu Type=gtx File=/dev/nvidia[0-3] CPUs=0-7
[...]
NodeName=gpu-9-[1-2]    Name=gpu Type=tesla File=/dev/nvidia[0-3] CPUs=0-7

and in nodes definitions:

# scontrol show node gpu-13-1 | grep -i gres
   Gres=gpu:gtx:8
# scontrol show node gpu-9-1 | grep -i gres
   Gres=gpu:tesla:8


Turns out that the QOS MinTRES limit rejects jobs that are requesting a specific GPU type, ie:
* "srun --gres gpu:1" works
* "srun --gres gpu:gtx:1" doesn't work, and results in "srun: error: Unable to allocate resources: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)"
Comment 1 Danny Auble 2016-02-24 05:45:29 MST
More pulled from 2475

Since TRES are fine grained (meaning the gtx and tesla would result in 2 different TRES) I am not sure it will work like you expect when specifying the type.  I can see what you are after though.

What I believe is happening right now is the gres/gpu is what you are tracking, but when the type gets thrown into the mix Slurm is treating it as a completely different TRES.  This fine grained implementation probably won't work for you.  I'm not sure how to change it and still support the situation where people do what the fine grained TRES, perhaps we could track both gres/gpu as well as gres/gpu:gtx|tesla, and when it comes in we match both gres/gpu as well as the type.  That would give us both.
Comment 2 Danny Auble 2016-03-04 09:20:25 MST
This has been added in commit 0cd692967b.  Let me know how it goes.