| Summary: | how to know if the job is waiting for license or machine in slurm? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Openfive Support <it_support> |
| Component: | User Commands | Assignee: | Jason Booth <jbooth> |
| Status: | RESOLVED TIMEDOUT | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | ||
| Version: | - Unsupported Older Versions | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Alphawave | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
Please attach your current slurm.conf and slurmctld.log. Also, how many jobs are in the queue scheduled to run before this one? Normally once a job is considered for scheduling then the reason is applied to that job. For example: > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 5512 debug hostname jason PD 0:00 1 (Licenses) > 5511 debug wrap jason R 0:26 1 n1 > $ scontrol show jobs 5512 > JobId=5512 JobName=hostname > JobState=PENDING Reason=Licenses |
Hi Slurm Support Team, We see if the job is waiting for License, still it is show in the Reason as Priority. Example:- [debajitd@osvnc001 ~]$ srun -p normal -L Innovus_Impl_System --pty /bin/tcsh srun: job 2377664 queued and waiting for resources And license status:- [debajitd@tmpxon014 ~]$ scontrol show lic Innovus_Impl_System LicenseName=Innovus_Impl_System Total=160 Used=160 Free=0 Reserved=0 Remote=no Here, the above job is waiting for license, but in the scontrol command it shows in the "Reason" as "Priority". Scontrol command output:- [root@hpcmaster ~]# scontrol show job 2377664 JobId=2377664 JobName=tcsh UserId=debajitd(3403) GroupId=engr(500) MCS_label=N/A Priority=4444 Nice=0 Account=(null) QOS=normal WCKey=* JobState=PENDING Reason=Priority Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=15-00:00:00 TimeMin=N/A SubmitTime=2023-09-28T12:01:56 EligibleTime=2023-09-28T12:01:56 AccrueTime=2023-09-28T12:01:56 StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-09-28T12:02:11 Partition=normal AllocNode:Sid=osvnc001:17136 ReqNodeList=(null) ExcNodeList=(null) NodeList=(null) NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=1,mem=4000M,node=1,billing=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=4000M MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=Innovus_Impl_System Network=(null) Command=/bin/tcsh WorkDir=/home/debajitd Power= NtasksPerTRES:0 [root@hpcmaster ~]# Now the problem here is that from the slurm side we are not able to know if the job is waiting for CPUs/RAM or Licenses, because in either of the case it is showing as, Reason=Priority. Is there any command or option that we can use to see that for what exactly the job is waiting for? Regards, Debajit Dutta