| Summary: | Reservations and cpus-per-task | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | lhuang |
| Component: | reservations | Assignee: | Oriol Vilarrubi <jvilarru> |
| Status: | RESOLVED INVALID | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | ||
| Version: | 20.11.0 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | NY Genome | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Hello, No, this is not the expected behavior. Could you please send me the output of the following commands: scontrol show config scontrol show partitions scontrol show nodes sacctmgr show qos rescomp That way I would be able to see if you have something in the configuration that makes this happen. I found the issue. That test node have OverSubscribe=FORCE:4 options enabled. Hence why we were able to request for more resources than available. Closing it out. Thanks. |
We've created a reservation, with one node. The node only has 20 cpus, however we found that we can request for --cpus-per-task=20 many times with srun. Is this the expected behavior? ReservationName=rescomp_4 StartTime=2021-07-16T10:03:00 EndTime=2021-07-16T20:03:00 Duration=10:00:00 Nodes=pe2cc2-068 NodeCnt=1 CoreCnt=20 Features=(null) PartitionName=(null) Flags=SPEC_NODES TRES=cpu=20 Users=(null) Groups=(null) Accounts=rescomp Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a MaxStartDelay=(null) [lhuang@pe2cc2-068 ~]$ squeue -u lhuang JOBID PARTITION NAME USER ST TIME TIME_LIMIT NODES CPUS MIN_MEMORY QOS PRIORITY NODELIST(REASON) 14372647 dev bash lhuang R 2:34 10:00:00 1 20 1000M rescomp 766290 pe2cc2-068 14372640 dev bash lhuang R 7:03 10:00:00 1 20 1000M rescomp 766290 pe2cc2-068 14372639 dev bash lhuang R 7:12 10:00:00 1 20 1000M rescomp 766290 pe2cc2-068 [lhuang@pe2cc2-068 ~]$ scontrol show node pe2cc2-068 NodeName=pe2cc2-068 Arch=x86_64 CpuBind=cores CoresPerSocket=10 CPUAlloc=20 CPUTot=20 CPULoad=0.01 AvailableFeatures=v2 ActiveFeatures=v2 Gres=(null) NodeAddr=pe2cc2-068 NodeHostName=pe2cc2-068 Version=20.11.0 OS=Linux 3.10.0-1160.15.2.el7.x86_64 #1 SMP Wed Feb 3 15:06:38 UTC 2021 RealMemory=230000 AllocMem=60000 FreeMem=248713 Sockets=2 Boards=1 State=ALLOCATED ThreadsPerCore=2 TmpDisk=0 Weight=3 Owner=N/A MCS_label=N/A Partitions=dev BootTime=2021-07-13T13:36:40 SlurmdStartTime=2021-07-13T13:37:31 CfgTRES=cpu=20,mem=230000M,billing=20 AllocTRES=cpu=20,mem=60000M CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Comment=(null)