| Summary: | Different weights for nodes depending on partition | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Renata Dart <renata> |
| Component: | Scheduling | Assignee: | Skyler Malinowski <skyler> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.02.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SLAC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Renata Dart
2021-05-04 10:06:55 MDT
Hello Renata, Conditional node weights is not a feature we support at this time. Preemption may be the only thing that will satisfy your constraints. Define the node weights with respect to the shared partition. Then have sets of partitions that correspond to the different GPU groups. Then allow accounts/users to submit to those partitions with higher priority. Also use a submission filter to append the shared partition. What do you think? Regards, Skyler Hi Skylar, I should have included this information to begin with, we have owner partitions
set up that can preempt shared:
PartitionName=shared Nodes=allrome,all_tur,all_volt,all_psc Default=YES Priority=10 MaxTime=5-00:00:00 DefaultTime=30 PreemptMode=CANCEL State=UP
PartitionName=supercdms Nodes=allrome Default=no AllowAccounts=supercdms Priority=50 MaxTime=5-00:00:00 DefaultTime=1-00:00:00 PreemptMode=OFF State=UP Qos=supercdms
PartitionName=cryoem Nodes=allrome,all_psc,all_tur,all_volt Default=no AllowAccounts=cryoem Priority=50 MaxTime=10-00:00:00 DefaultTime=1-00:00:00 PreemptMode=OFF State=UP Qos=cryoem
and we have these nodelists:
NodeSet=allrome Nodes=rome[0001-0004,0011-0014,0021-0024,0031-0034,0041-0044,0051-0054,0061-0064,0071-0074,0081-0084,0091-0094,0101-0104,0111-0114,0121-0124,0131-0134,0141-0144,0151-0154,0\
161-0164,0171-0174,0181-0184,0191-0194,0201-0204,0211-0214,0221-0224,0231-0234,0241-0244,0251-0254,0261-0264]
NodeSet=all_tur Nodes=tur[000-026]
NodeSet=all_volt Nodes=volt[000-005]
NodeSet=all_psc Nodes=psc[000-009]
And we have a partition qos set up for each partition that spells out the number of each kind of host:
supercdms cpu=256
cryoem cpu=1424,gres/gpu:geforce_gtx_1080_ti=90,gres/gpu:geforce_rtx_2080_ti=110,gres/gpu:v100=4
I don't have a partition qos for shared since they have access to all the hosts but can
be preempted.
Is there a way we can set it up to allow shared to schedule first on the slower gpus while the
other partitions schedule first on the fastest?
Thanks,
Renata
On Tue, 4 May 2021, bugs@schedmd.com wrote:
>https://bugs.schedmd.com/show_bug.cgi?id=11523
>
>--- Comment #1 from Skyler Malinowski <malinowski@schedmd.com> ---
>Hello Renata,
>
>Conditional node weights is not a feature we support at this time.
>
>Preemption may be the only thing that will satisfy your constraints. Define the
>node weights with respect to the shared partition. Then have sets of partitions
>that correspond to the different GPU groups. Then allow accounts/users to
>submit to those partitions with higher priority. Also use a submission filter
>to append the shared partition.
>
>What do you think?
>
>Regards,
>Skyler
>
>--
>You are receiving this mail because:
>You reported the bug.
> Is there a way we can set it up to allow shared to schedule first on
> the slower gpus while the other partitions schedule first on the fastest?
Perhaps I was not clear. No, this is not supported. I was calling this non-existent feature 'conditional node weights' for convenience of reference (although that may not be the actual name of it).
You may create a feature request if you feel strongly about it.
You seem to have a configuration that works around the lack of this feature or something like it.
Thanks Skyler, I just wanted to clarify what our configuration looked like in case there might be some way to change the weighting for shared. Thanks for the details, we'll consider whether we want to make a feature request. Feel free to close this one out. Thanks, Renata On Wed, 5 May 2021, bugs@schedmd.com wrote: >https://bugs.schedmd.com/show_bug.cgi?id=11523 > >--- Comment #3 from Skyler Malinowski <malinowski@schedmd.com> --- >> Is there a way we can set it up to allow shared to schedule first on >> the slower gpus while the other partitions schedule first on the fastest? >Perhaps I was not clear. No, this is not supported. I was calling this >non-existent feature 'conditional node weights' for convenience of reference >(although that may not be the actual name of it). > >You may create a feature request if you feel strongly about it. > >You seem to have a configuration that works around the lack of this feature or >something like it. > >-- >You are receiving this mail because: >You reported the bug. Changing ticket status to `resolved: info given`. |