We are still trying to troubleshoot the scheduling issue that we have here. The low priority jobs aren't getting considered even though the job size is small. One of the jobs that we are trying to re-nice had also produced some error: # grep 11349763 slurmctld.log [2019-09-23T15:20:38.702] Recovered JobId=11349763 Assoc=2955 [2019-09-23T16:37:53.716] Recovered JobId=11349763 Assoc=2955 [2019-09-23T18:28:12.116] ignore nice set request on JobId=11349763 [2019-09-23T18:28:12.116] _slurm_rpc_update_job: complete JobId=11349763 uid=0 usec=149744 [2019-09-23T18:29:37.447] ignore nice set request on JobId=11349763 [2019-09-23T18:29:37.448] _slurm_rpc_update_job: complete JobId=11349763 uid=0 usec=3447 [2019-09-23T18:30:27.861] ignore nice set request on JobId=11349763 [2019-09-23T18:30:27.861] _slurm_rpc_update_job: complete JobId=11349763 uid=0 usec=224725 18:19:43 m3-login2:~ ctan$ sjob 11349763 JobId=11349763 JobName=MyJob UserId=abut0011(12029) GroupId=monashuniversity(10025) MCS_label=N/A Priority=18000 Nice=0 Account=of33 QOS=normal JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A SubmitTime=2019-09-19T15:00:44 EligibleTime=2019-09-19T15:00:44 AccrueTime=2019-09-19T15:00:44 StartTime=Unknown EndTime=Unknown Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2019-09-23T18:34:22 Partition=comp AllocNode:Sid=m3-login1:15075 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,mem=4G,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=4G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/scratch/of33/Alana/sbatch_script_test.sh WorkDir=/scratch/of33/Alana StdErr=/scratch/of33/Alana/slurm-11349763.out StdIn=/dev/null StdOut=/scratch/of33/Alana/slurm-11349763.out Power= 18:42:05 m3-login2:~ ctan$ sudo `which scontrol` update JobID=11349763 Nice=-10000 18:42:08 m3-login2:~ ctan$ echo $? 0 The command will complete but job priority won't get updated. Any ideas?
The error message "ignore nice set request on JobId" occurs when the job's priority has already been manually set, either at submission or via scontrol. In other words, if the job's priority was already adjusted by an administrator then adjusting the "nice" value will have no effect and is ignored. Do you know if this job's priority was manually adjusted?
We haven't changed the priority for the job and renice was the attempt. Further to that, when we tried to see the priority of his jobs with sprio, it returns with no job. 09:16:00 m3-login1:~ ctan$ sprio -u abut0011 JOBID PARTITION USER PRIORITY AGE FAIRSHARE JOBSIZE PARTITION QOS 09:16:20 m3-login1:~ ctan$ man sprio 09:16:46 m3-login1:~ ctan$ squeue -u abut0011 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 11349763 comp MyJob abut0011 PD 0:00 1 (Priority) 11349781 comp MyJob abut0011 PD 0:00 1 (Priority) His jobs are in the queue with the squeue & scontrol command. Based on the man page, by default, sprio return information for all pending jobs, is there a reason why the jobs 11349763 & 11349781 aren't showing in the command? 09:19:59 m3-login1:~ ctan$ sprio -j 11349763 Unable to find jobs matching user/id(s) specified 09:20:01 m3-login1:~ ctan$ sprio -j 11349781 Unable to find jobs matching user/id(s) specified
If the job had a manual priority set then it also would not show up in sprio. It looks like in the case for job 11349763 the priority was set by an administrator to "18000" specifically (a nice even number too).
We will be updating the documentation to note that jobs with a manual priority are not displayed in sprio. Is there anything else that we can help with on this ticket?
Info given. sprio output will be addressed with bug 4757.