| Summary: | Heterogeneous allocation randomness | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Amit Kumar <ahkumar> |
| Component: | Scheduling | Assignee: | Danny Auble <da> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 2 - High Impact | ||
| Priority: | --- | CC: | ahkumar |
| Version: | 16.05.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SMU | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 16.05.08 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | slurm.conf | ||
Amit, could you send your slurm.conf? At first glance it is hard to tell if there is an error or not. Created attachment 4547 [details]
slurm.conf
Conf file attached
Amit, When I emulate your system I can see it grabbing an extra node just like you do. But I always see all the nodes being completely allocated (never partially). I believe your problem lies with DefMemPerCPU=2048 You will notice your knl nodes don't have that much memory per cpu (each has 256 as we count threads as cpus). So this means by default you have 96000/2048=46 cpus allocatible on your knls (thus the reason you have to grab other nodes). On your other nodes you are set as 256000/2048=125 which is more than the number of cpus you have. I always get the correct thing when I add --mem=0 as that will allocate all the memory on the node. Does this make sense? Let me know if you need anything else on this. Danny, Believe me while we were setting up the configuration file, I made a note that I need to change this DefMemPerCPU because it was going to bite me at some point. Thank you for pointing this one!! This has resolved my mistake!! Regards, Amit No problem Amit, glad it was an easy one. In the future could you mark bugs with severities lined out in https://schedmd.com/support.php? A Severity 2 issue is a high-impact problem that is causing sporadic outages or is consistently encountered by end users with adverse impact to end user interaction with the system. I am not sure I would classify this bug as a sev 2. It helps with our SLAs. I do understand it was confusing and annoying though. You will find we are fairly fast at response no matter the severity though ;). Thanks! |
Dear SchedMD, We are requesting three different types of nodes from a partition, that has included all our nodes in the cluster. Different types of nodes we have are Broadwell nodes(36 CPU-cores), KNL(64 cores) nodes, and P100 nodes(36 CPU-cores and 1 P100). Here is a sample allocation request, that explicitly requests a set of 6 nodes with tasks (-n) equal to max cores combined. And the result is I am seeing random idle nodes allocated out of what I had requested #salloc -J hybrid -n 272 --exclusive -p defq -w b001,b002,p035,p036,k003,k015 -x admin[01-03] salloc: Pending job allocation 11534 salloc: job 11534 queued and waiting for resources salloc: job 11534 has been allocated resources salloc: Granted job allocation 11534 ~]# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 11534 defq hybrid root R 0:05 10 b[001-002],k[003,015],login[01-04],p[035-036] If you notice above I got allocations on all nodes I requested, although not all CPU-cores on them. Instead I was randomly give few cores off of login[01-04]. I think this is odd. If I were to Specify -N flag, then I get BadContrains as the reason and the allocation remains pending: JobId=11535 JobName=hybrid UserId=root(0) GroupId=root(0) MCS_label=N/A Priority=0 Nice=0 Account=root QOS=normal JobState=PENDING Reason=BadConstraints Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2017-05-12T11:03:04 EligibleTime=2017-05-12T11:03:04 StartTime=Unknown EndTime=Unknown Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=defq AllocNode:Sid=cm1:12429 ReqNodeList=b[001-002],k[003,015],p[035-036] ExcNodeList=admin[01-03] NodeList=(null) NumNodes=6-6 NumCPUs=272 NumTasks=272 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=272,mem=557056,node=6 Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* MinCPUsNode=1 MinMemoryCPU=2G MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null) Command=(null) WorkDir=/root Power= How can I get around this any help here is greatly appreciated. Thank you, Amit