Ticket 7908 - Reservation creation fails when using exact node list
Summary: Reservation creation fails when using exact node list
Status: RESOLVED DUPLICATE of ticket 7458
Alias: None
Product: Slurm
Classification: Unclassified
Component: slurmctld (show other tickets)
Version: 19.05.3
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Nate Rini
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2019-10-10 05:53 MDT by Ville Ahlgren
Modified: 2019-11-13 12:54 MST (History)
1 user (show)

See Also:
Site: CSC - IT Center for Science
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
slurm.conf (5.18 KB, text/plain)
2019-10-10 05:53 MDT, Ville Ahlgren
Details
partition.conf (2.78 KB, text/plain)
2019-10-10 05:54 MDT, Ville Ahlgren
Details
patch (CSC only) (602 bytes, patch)
2019-10-15 10:23 MDT, Nate Rini
Details | Diff
patch (CSC only) (1.03 KB, patch)
2019-10-17 12:33 MDT, Nate Rini
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Ville Ahlgren 2019-10-10 05:53:59 MDT
Created attachment 11897 [details]
slurm.conf

Hi

It seems like reservation creation some times fails without valid reason.

Here is an example which illustrates the issue:

Trying to create a reservation which starts ~3.5 days in future. Max run time for the nodes is 3 days by partition configuration and the last ending job currently running on these nodes has EndTime=2019-10-13T14:16:23.

# scontrol create reservation user=robertse starttime=2019-10-14T09:00:00 duration=08:00:00 ReservationName=openACC_course_mon nodes=r01g[02-03]

Error creating the reservation: Requested nodes are busy

However, the same nodes gets selected when asking for the reservation by specifying partition and node count:

# scontrol create reservation user=robertse starttime=2019-10-14T09:00:00 duration=08:00:00 ReservationName=openACC_course_mon nodecnt=2 partition=gpu

Reservation created: openACC_course_mon

# scontrol show res openACC_course_mon
ReservationName=openACC_course_mon StartTime=2019-10-14T09:00:00 EndTime=2019-10-14T17:00:00 Duration=08:00:00
   Nodes=r01g[02-03] NodeCnt=2 CoreCnt=80 Features=(null) PartitionName=gpu Flags=
   TRES=cpu=80
   Users=robertse Accounts=(null) Licenses=(null) State=INACTIVE BurstBuffer=
(null) Watts=n/a

But then again:

# scontrol delete ReservationName=openACC_course_mon

# scontrol create reservation user=robertse starttime=2019-10-14T09:00:00 duration=08:00:00 ReservationName=openACC_course_mon nodes=r01g[02-03]

Error creating the reservation: Requested nodes are busy

These actions produced only these log lines:

[2019-10-10T13:57:16.926] Reservation request overlaps jobs
[2019-10-10T13:57:16.926] _slurm_rpc_resv_create reservation=openACC_course_mon: Requested nodes are busy

[2019-10-10T14:00:05.641] sched: Created reservation=openACC_course_mon users=robertse nodes=r01g[02-03] cores=80 licenses=(null) tres=cpu=80 watts=4294967294 start=2019-10-14T09:00:00 end=2019-10-14T17:00:00

[2019-10-10T14:00:32.335] _slurm_rpc_delete_reservation complete for openACC_course_mon usec=401

[2019-10-10T14:00:49.152] Reservation request overlaps jobs
[2019-10-10T14:00:49.152] _slurm_rpc_resv_create reservation=openACC_course_mon: Requested nodes are busy
Comment 1 Ville Ahlgren 2019-10-10 05:54:25 MDT
Created attachment 11898 [details]
partition.conf
Comment 2 Nate Rini 2019-10-10 16:46:49 MDT
(In reply to Ville Ahlgren from comment #0)
> It seems like reservation creation some times fails without valid reason.
Reservations use the same code as job node selection. Is it possible to call the following before trying your reservation and few times to get it to fail?

> scontrol setdebugflags +Reservation
> scontrol setdebugflags +SelectType
-- create reservation here --

> scontrol setdebugflags -Reservation
> scontrol setdebugflags -SelectType

Please upload your slurmctld log after this testing. If you can get it to fail, that would be a lot more helpful.
 
Thanks,
--Nate
Comment 3 CSC sysadmins 2019-10-14 01:27:01 MDT
Hi,

I enabled selecttype and reservation debug flags but according to logs slurmctld refuses to create reservation immediately?

[root@mslurm1 ~]# scontrol create reservation user=robertse starttime=2019-10-17T11:00:00 duration=08:00:00 ReservationName=bug_7908_test nodes=r04g[01-02]

Jobs are not overlapping:

[root@mslurm1 ~]# squeue -w r04g[01-02] -O endtime
END_TIME            
2019-10-16T04:16:08 
2019-10-15T05:02:38 
2019-10-15T10:13:42 
2019-10-16T23:16:34 
2019-10-16T23:16:34 


[2019-10-14T10:14:51.153] create_resv: Name=bug_7908_test StartTime=2019-10-17T11:00:00 EndTime=-1 Duration=480 Flags=(null) NodeCnt=(null) CoreCnt=(null) NodeList=r04g[01-02] Features=(null) PartitionName=(null) Users=robertse Accounts=(null) Licenses=(null) BurstBuffer=(null) TRES=(null) Watts=n/a
[2019-10-14T10:14:51.153] Reservation request overlaps jobs
[2019-10-14T10:14:51.153] _slurm_rpc_resv_create reservation=bug_7908_test: Requested nodes are busy
Comment 4 Nate Rini 2019-10-14 10:37:58 MDT
(In reply to Tommi Tervo from comment #3)
> I enabled selecttype and reservation debug flags but according to logs
> slurmctld refuses to create reservation immediately?
Looks like the nodes are not available for the reservation (same logic as a selecting for a job). Are there jobs on them? Can you please take a 'scontrol show nodes $NODE` of them?
Comment 5 CSC sysadmins 2019-10-14 23:49:55 MDT
(In reply to Nate Rini from comment #4)
> (In reply to Tommi Tervo from comment #3)
> > I enabled selecttype and reservation debug flags but according to logs
> > slurmctld refuses to create reservation immediately?
> Looks like the nodes are not available for the reservation (same logic as a
> selecting for a job). Are there jobs on them? 

As you can see from comment #3, all jobs had endtime < reservation starttime.

> Can you please take a
> 'scontrol show nodes $NODE` of them?

sure:

[root@mslurm1 ~]# scontrol show node r04g01
NodeName=r04g01 Arch=x86_64 CoresPerSocket=20 
   CPUAlloc=11 CPUTot=40 CPULoad=6.75
   AvailableFeatures=type_gpu
   ActiveFeatures=type_gpu
   Gres=gpu:v100:4(S:0-1),nvme:3600
   NodeAddr=r04g01 NodeHostName=r04g01 
   OS=Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Tue Jul 9 16:53:14 UTC 2019 
   RealMemory=382000 AllocMem=69000 FreeMem=2607 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=400 Owner=N/A MCS_label=N/A
   Partitions=All,gpu 
   BootTime=2019-09-19T16:55:38 SlurmdStartTime=2019-10-08T16:20:19
   CfgTRES=cpu=40,mem=382000M,billing=40,gres/gpu:v100=4,gres/nvme=3600
   AllocTRES=cpu=11,mem=69000M,gres/gpu:v100=3,gres/nvme=3200
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   

[root@mslurm1 ~]# scontrol show node r04g02
NodeName=r04g02 Arch=x86_64 CoresPerSocket=20 
   CPUAlloc=1 CPUTot=40 CPULoad=3.88
   AvailableFeatures=type_gpu
   ActiveFeatures=type_gpu
   Gres=gpu:v100:4(S:0-1),nvme:3600
   NodeAddr=r04g02 NodeHostName=r04g02 
   OS=Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Tue Jul 9 16:53:14 UTC 2019 
   RealMemory=382000 AllocMem=8192 FreeMem=359467 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=400 Owner=N/A MCS_label=N/A
   Partitions=All,gpu 
   BootTime=2019-09-19T16:55:38 SlurmdStartTime=2019-10-08T16:20:19
   CfgTRES=cpu=40,mem=382000M,billing=40,gres/gpu:v100=4,gres/nvme=3600
   AllocTRES=cpu=1,mem=8G,gres/gpu:v100=4,gres/nvme=500
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment 8 Nate Rini 2019-10-15 10:23:01 MDT
Created attachment 11956 [details]
patch (CSC only)

(In reply to Tommi Tervo from comment #5)
> As you can see from comment #3, all jobs had endtime < reservation starttime.

Tommi

Can you please apply this patch to Slurm. Only slurmctld will need to be restarted. Can you then please make sure the reservation debugflag is active and try your reservation again?

It will dump the conflicting Job's id. Can you then please call `scontrol show job $JOBID` of that job and provide the slurmctld log for that time period.

Thanks,
--Nate
Comment 9 CSC sysadmins 2019-10-16 01:10:10 MDT
Here it is, something is really wrong:

[root@mslurm1 tmp]# scontrol create reservation user=robertse starttime=2019-10-19T13:00:00 duration=08:00:00 ReservationName=bug_7908_test nodes=r04g[01-02]
Error creating the reservation: Requested nodes are busy

[2019-10-16T10:05:52.306] _job_overlap: reservation (null) would overlap with JobId=318338
[2019-10-16T10:05:52.306] Reservation request overlaps jobs


[root@mslurm1 tmp]# scontrol show job 318338
JobId=318338 JobName=j290_prod_3
   UserId=kronenbe(10005615) GroupId=kronenbe(10005615) MCS_label=N/A
   Priority=1253 Nice=0 Account=project_2001197 QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=1-22:05:59 TimeLimit=3-00:00:00 TimeMin=N/A
   SubmitTime=2019-10-14T12:00:05 EligibleTime=2019-10-14T12:00:05
   AccrueTime=2019-10-14T12:00:05
   StartTime=2019-10-14T12:00:06 EndTime=2019-10-17T12:00:06 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-10-14T12:00:06
   Partition=gpu AllocNode:Sid=puhti-login1.bullx:8003
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=r04g01
   BatchHost=r04g01
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=N/A ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,mem=5000M,node=1,billing=1,gres/gpu:v100=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=5000M MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/users/kronenbe/.schrodinger/.jobdb2/files/puhti-login1-5da43/puhti-login1-0-5da43915.batch
   WorkDir=/scratch/project_2001197/SurA/290_prod
   Comment=SchrodingerJobId=puhti-login1-0-5da43915 
   StdErr=/users/kronenbe/.schrodinger/.jobdb2/files/puhti-login1-5da43/puhti-login1-0-5da43915.qlog
   StdIn=/dev/null
   StdOut=/users/kronenbe/.schrodinger/.jobdb2/files/puhti-login1-5da43/puhti-login1-0-5da43915.qlog
   Power=
   TresPerNode=gpu:v100:1
Comment 10 Nate Rini 2019-10-16 11:44:44 MDT
(In reply to Tommi Tervo from comment #9)
> [2019-10-16T10:05:52.306] _job_overlap: reservation (null) would overlap
> with JobId=318338
The "(null)" here is expected for a new reservation. I'm going to try to recreate this internally.

Thanks for providing the log. You can leave the patch in and submit more reservations that have this issue or reverse it if you want.
--Nate
Comment 12 Nate Rini 2019-10-17 12:33:54 MDT
Created attachment 11997 [details]
patch (CSC only)

Tommi,

Please try this patch.

Thanks,
--Nate
Comment 13 CSC sysadmins 2019-10-18 00:54:44 MDT
Hi,

With second patch applied I'm not able to reproduce this bug.

-Tommi
Comment 14 Nate Rini 2019-10-18 10:15:41 MDT
(In reply to Tommi Tervo from comment #13)
> With second patch applied I'm not able to reproduce this bug.

The patch is undergoing QA review currently in bug#7458 and should be in 19.05.5+. Marking this as a duplicate. Please reply if you have any questions.

Thanks,
--Nate

*** This ticket has been marked as a duplicate of ticket 7458 ***
Comment 16 Nate Rini 2019-11-13 12:54:06 MST
Tommy,

This fix is now upstream for 19.05.04 which should be released soon.

Thanks,
--Nate