Ticket 11523 - Different weights for nodes depending on partition
Summary: Different weights for nodes depending on partition
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other tickets)
Version: 20.02.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Skyler Malinowski
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2021-05-04 10:06 MDT by Renata Dart
Modified: 2021-05-05 08:34 MDT (History)
0 users

See Also:
Site: SLAC
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Renata Dart 2021-05-04 10:06:55 MDT
Hi Schedmd, we have a mix of gpu types in our cluster weighted as follows:

# pascal 1080ti
NodeName=psc[000-009] CPUs=48 RealMemory=257336 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:geforce_gtx_1080_ti:10 Features=CPU_GEN:HSW,CPU_SKU:E5-2670v3,CPU_FRQ:2.30GHz,GPU_GEN:PSC,GPU_SKU:GTX1080TI,GPU_MEM:11GB,GPU_CC:6.1  Weight=58319   State=UNKNOWN

# tur 2080ti
NodeName=tur[000-026]   CPUs=48 RealMemory=191552 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 Gres=gpu:geforce_rtx_2080_ti:10 Features=CPU_GEN:SKX,CPU_SKU:5118,CPU_FRQ:2.30GHz,GPU_GEN:TUR,GPU_SKU:RTX2080TI,GPU_MEM:11GB,GPU_CC:7.5       Weight=56117   State=UNKNOWN

# volta v100
NodeName=volt[000-005]   CPUs=32 RealMemory=191567 Sockets=2 CoresPerSocket=8  ThreadsPerCore=2 Gres=gpu:v100:4                 Features=CPU_GEN:SKX,CPU_SKU:4110,CPU_FRQ:2.10GHz,GPU_GEN:VLT,GPU_SKU:V100,GPU_MEM:32GB,GPU_CC:7.0            Weight=127207  State=UNKNOWN

This works well for our partitions that have priority over the shared partition.  We want the "owner" partitions to get the faster/newer gpus as a first choice, but for the shared partition we would actually like the weights to be reversed  so that users get the slower/older gpus first.  Is there a way to do that?

Thanks,
Renata
Comment 1 Skyler Malinowski 2021-05-04 13:45:42 MDT
Hello Renata,

Conditional node weights is not a feature we support at this time.

Preemption may be the only thing that will satisfy your constraints. Define the node weights with respect to the shared partition. Then have sets of partitions that correspond to the different GPU groups. Then allow accounts/users to submit to those partitions with higher priority. Also use a submission filter to append the shared partition.

What do you think?

Regards,
Skyler
Comment 2 Renata Dart 2021-05-04 14:05:02 MDT
Hi Skylar, I should have included this information to begin with, we have owner partitions
set up that can preempt shared:

PartitionName=shared      Nodes=allrome,all_tur,all_volt,all_psc  Default=YES           Priority=10   MaxTime=5-00:00:00     DefaultTime=30          PreemptMode=CANCEL State=UP
PartitionName=supercdms   Nodes=allrome          Default=no    AllowAccounts=supercdms  Priority=50   MaxTime=5-00:00:00     DefaultTime=1-00:00:00  PreemptMode=OFF   State=UP Qos=supercdms
PartitionName=cryoem      Nodes=allrome,all_psc,all_tur,all_volt  Default=no    AllowAccounts=cryoem     Priority=50   MaxTime=10-00:00:00    DefaultTime=1-00:00:00  PreemptMode=OFF   State=UP Qos=cryoem


and we have these nodelists:

NodeSet=allrome    Nodes=rome[0001-0004,0011-0014,0021-0024,0031-0034,0041-0044,0051-0054,0061-0064,0071-0074,0081-0084,0091-0094,0101-0104,0111-0114,0121-0124,0131-0134,0141-0144,0151-0154,0\
161-0164,0171-0174,0181-0184,0191-0194,0201-0204,0211-0214,0221-0224,0231-0234,0241-0244,0251-0254,0261-0264]
NodeSet=all_tur    Nodes=tur[000-026]
NodeSet=all_volt   Nodes=volt[000-005]
NodeSet=all_psc    Nodes=psc[000-009]

And we have a partition qos set up for each partition that spells out the number of each kind of host:

 supercdms                                                                                              cpu=256 
    cryoem            cpu=1424,gres/gpu:geforce_gtx_1080_ti=90,gres/gpu:geforce_rtx_2080_ti=110,gres/gpu:v100=4 

I don't have a partition qos for shared since they have access to all the hosts but can
be preempted.

Is there a way we can set it up to allow shared to schedule first on the slower gpus while the
other partitions schedule first on the fastest?

Thanks,
Renata


On Tue, 4 May 2021, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=11523
>
>--- Comment #1 from Skyler Malinowski <malinowski@schedmd.com> ---
>Hello Renata,
>
>Conditional node weights is not a feature we support at this time.
>
>Preemption may be the only thing that will satisfy your constraints. Define the
>node weights with respect to the shared partition. Then have sets of partitions
>that correspond to the different GPU groups. Then allow accounts/users to
>submit to those partitions with higher priority. Also use a submission filter
>to append the shared partition.
>
>What do you think?
>
>Regards,
>Skyler
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 3 Skyler Malinowski 2021-05-05 07:45:08 MDT
> Is there a way we can set it up to allow shared to schedule first on
> the slower gpus while the other partitions schedule first on the fastest?
Perhaps I was not clear. No, this is not supported. I was calling this non-existent feature 'conditional node weights' for convenience of reference (although that may not be the actual name of it).

You may create a feature request if you feel strongly about it.

You seem to have a configuration that works around the lack of this feature or something like it.
Comment 4 Renata Dart 2021-05-05 08:11:40 MDT
Thanks Skyler, I just wanted to clarify what our configuration looked
like in case there might be some way to change the weighting for
shared.  Thanks for the details, we'll consider whether we want to
make a feature request.  Feel free to close this one out.

Thanks,
Renata

On Wed, 5 May 2021, bugs@schedmd.com wrote:

>https://bugs.schedmd.com/show_bug.cgi?id=11523
>
>--- Comment #3 from Skyler Malinowski <malinowski@schedmd.com> ---
>> Is there a way we can set it up to allow shared to schedule first on
>> the slower gpus while the other partitions schedule first on the fastest?
>Perhaps I was not clear. No, this is not supported. I was calling this
>non-existent feature 'conditional node weights' for convenience of reference
>(although that may not be the actual name of it).
>
>You may create a feature request if you feel strongly about it.
>
>You seem to have a configuration that works around the lack of this feature or
>something like it.
>
>-- 
>You are receiving this mail because:
>You reported the bug.
Comment 5 Skyler Malinowski 2021-05-05 08:34:20 MDT
Changing ticket status to `resolved: info given`.