# squeue -j 44289341 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 44289341 hov wrap hepperla PD 0:00 4 (JobHeldUser) [root@longleaf-sched slurm_utils]# scontrol show job 44289341 JobId=44289341 JobName=wrap UserId=hepperla(214234) GroupId=its_graduate_psx(203) MCS_label=N/A Priority=0 Nice=0 Account=rc_ijdavis_pi QOS=normal JobState=PENDING Reason=JobHeldUser Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=10-12:00:00 TimeMin=N/A SubmitTime=2019-12-12T09:33:38 EligibleTime=2019-12-12T09:33:38 AccrueTime=Unknown StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-12-12T10:41:49 Partition=hov AllocNode:Sid=longleaf-login2.its.unc.edu:13492 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) SchedNodeList=c0308 NumNodes=1-1 NumCPUs=12 NumTasks=12 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=12,mem=200G,node=1,billing=12 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=50G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/proj/dllab/Austin/TCGA/BRCA/ATAC_bams StdErr=/proj/dllab/Austin//SLURM_logs//2019/12/12/20191212_09-33-38-508 StdIn=/dev/null StdOut=/proj/dllab/Austin//SLURM_logs/2019/12/12/20191212_09-33-38-508 Power=
Hi Dose this occurs only for held jobs? If yes, I can recreate this. Dominik
No . I held it after Get Outlook for Android<https://aka.ms/ghei36> ________________________________ From: bugs@schedmd.com <bugs@schedmd.com> Sent: Thursday, December 12, 2019 11:47:04 AM To: Williams, Jenny Avis <jennyw@email.unc.edu> Subject: [Bug 8224] When changing a pending job from 4 to 1 node via scontrol, the change is not reflected in the squeue listing of the job; it is in scontrol Comment # 1<https://bugs.schedmd.com/show_bug.cgi?id=8224#c1> on bug 8224<https://bugs.schedmd.com/show_bug.cgi?id=8224> from Dominik Bartkiewicz<mailto:bart@schedmd.com> Hi Dose this occurs only for held jobs? If yes, I can recreate this. Dominik ________________________________ You are receiving this mail because: * You reported the bug.
Hi Squeue display node counts based on the value returned from the select plugin and this can take some time to reevaluate. Could you check if this value is updated if you wait a few minutes? Dominik
We're seeing this bug at LLNL too. The job waiting on resources gets updated in the squeue output, but jobs lower in the queue don't. I waited at least 5 minutes and it never updated. Using the cons_res select plugin, and slurm 19.05.5 fwiw. Here's what I see evidence: [day36@ipa15:~]$ srun -N3 sleep 600 & [1] 126473 [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22277 pall sleep day36 R 0:06 3 ipa[4-5,7] [day36@ipa15:~]$ srun -N3 sleep 600 & [2] 126492 [day36@ipa15:~]$ srun: job 22278 queued and waiting for resources srun -N3 sleep 600 & [3] 126498 [day36@ipa15:~]$ srun: job 22279 queued and waiting for resources squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 3 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 0:14 3 ipa[4-5,7] [day36@ipa15:~]$ scontrol update jobid=22279 numnodes=1-1 [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 3 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 0:49 3 ipa[4-5,7] [day36@ipa15:~]$ scontrol show job 22279 | grep -i numnodes NumNodes=1-1 NumCPUs=3 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 3 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 1:12 3 ipa[4-5,7] [day36@ipa15:~]$ scontrol update jobid=22278 numnodes=1-1 [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 1 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 1:23 3 ipa[4-5,7] [day36@ipa15:~]$ scontrol show job 22278 | grep -i numnodes NumNodes=1-1 NumCPUs=3 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 1 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 5:28 3 ipa[4-5,7] [day36@ipa15:~]$ squeue -p pall JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 22278 pall sleep day36 PD 0:00 1 (Resources) 22279 pall sleep day36 PD 0:00 3 (Priority) 22277 pall sleep day36 R 6:43 3 ipa[4-5,7] [day36@ipa15:~]$
Hi This commit should fix this issue: https://github.com/SchedMD/slurm/commit/623574431d545b2ff0 We are still waiting for a review of additional patches (bug 8110) to be able to return a more precise estimation of nodes count required by jobs. I'll go ahead and close this ticket, but feel free to let me know if you have any additional questions about the fix. Dominik
Thanks Dominik. At first glance, it looks like it shouldn't be any problem to apply this patch back to 19.05 as well. Do you know of reason that wouldn't work? (In reply to Dominik Bartkiewicz from comment #9) > Hi > > This commit should fix this issue: > https://github.com/SchedMD/slurm/commit/623574431d545b2ff0 > > We are still waiting for a review of additional patches (bug 8110) to be > able to return a more precise estimation of nodes count required by jobs. > > I'll go ahead and close this ticket, but feel free to let me know if you have > any additional questions about the fix. > > Dominik
Hi This patch was prepared for 19.05 and it will work correctly with it. Dominik
*** Ticket 8747 has been marked as a duplicate of this ticket. ***