Hello, we are in the process of intergrating a new storage system called WekaIO
I accedentally pressed enter... so here it goes: Weka storage system uses a client running on the compute nodes in our cluster and we use udp mount and tell weka to use number of cores like this: mount -t wekafs -o net=udp -o num_cores=2 weka-1/wekafs2 /mnt/wekafs How can we be sure that slurm will not use the cores that have been assigned to wekafs client? We found this: https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html that says we can use "isolcpus" kernel parameter, will that work with slurm?
CpuSpecList is what you want. I talked with my colleague Michael about his comment in bug 12674 that CpuSpecList doesn’t work with task/affinity. We found that he was mistaken - task/affinity works just fine with CpuSpecList. The only requirement is that you have task/cgroup, but you can also have task/affinity, like this: TaskPlugin=task/cgroup,task/affinity From the slurm.conf man page[1]: “A comma-delimited list of Slurm abstract CPU IDs reserved for system use. The list will be expanded to include all other CPUs, if any, on the same cores. These cores will not be available for allocation to user jobs. Depending upon the TaskPluginParam option of SlurmdOffSpec[2], Slurm daemons (i.e. slurmd and slurmstepd) may either be confined to these resources (the default) or prevented from using these resources.” There are some important caveats, though. (1) The CPUs specified by CpuSpecList (or CoreSpecCount) are still shown as available in “scontrol show node”: # slurm.conf NodeName=DEFAULT RealMemory=8000 Sockets=1 CoresPerSocket=8 ThreadsPerCore=2 NodeName=n1-[1-10] NodeAddr=localhost Port=36001-36010 CpuSpecList=0-1 $ scontrol show nodes n1-1 |egrep -i 'nodename|cpu' NodeName=n1-1 Arch=x86_64 CoresPerSocket=8 CPUAlloc=0 CPUTot=16 CPULoad=1.28 CoreSpecCount=1 CPUSpecList=0-1 CfgTRES=cpu=16,mem=8000M,billing=16,gres/gpu=4 This can be confusing to users who think they can be allocated all the CPUs on a node, but they actually can’t: $ srun -N1 -n16 whereami srun: error: Unable to allocate resources: Requested node configuration is not available $ srun -N2 -n16 whereami 0000 n1-1 - Cpus_allowed: 0002 Cpus_allowed_list: 1 0001 n1-1 - Cpus_allowed: 0200 Cpus_allowed_list: 9 0006 n1-1 - Cpus_allowed: 0010 Cpus_allowed_list: 4 0007 n1-1 - Cpus_allowed: 1000 Cpus_allowed_list: 12 0003 n1-1 - Cpus_allowed: 0400 Cpus_allowed_list: 10 0004 n1-1 - Cpus_allowed: 0008 Cpus_allowed_list: 3 0005 n1-1 - Cpus_allowed: 0800 Cpus_allowed_list: 11 0002 n1-1 - Cpus_allowed: 0004 Cpus_allowed_list: 2 0014 n1-2 - Cpus_allowed: 0010 Cpus_allowed_list: 4 0012 n1-2 - Cpus_allowed: 0008 Cpus_allowed_list: 3 0010 n1-2 - Cpus_allowed: 0004 Cpus_allowed_list: 2 0011 n1-2 - Cpus_allowed: 0400 Cpus_allowed_list: 10 0008 n1-2 - Cpus_allowed: 0002 Cpus_allowed_list: 1 0009 n1-2 - Cpus_allowed: 0200 Cpus_allowed_list: 9 0013 n1-2 - Cpus_allowed: 0800 Cpus_allowed_list: 11 0015 n1-2 - Cpus_allowed: 1000 Cpus_allowed_list: 12 We would like to improve this by not showing the CPUs in CpuSpecList/CoreSpecCount in the tres list, but this is more involved than simply changing the output of scontrol show node. Since CpuSpecList still works as designed, we consider this an enhancement. Unless this development is sponsored then we won’t be able to commit to it. Just make sure your users are aware of this. (2) There are two bugs with CpuSpecList in 21.08: (i) On some systems, jobs won’t run with CpuSpecList in 21.08. See bug 12393. I uploaded a fix in bug 12393 comment 54. However, since this patch changes plugin ABI, we can’t put this into 21.08, and we haven’t found a fix for 21.08. If you upgrade to 21.08, then you should first test this (before upgrading the production cluster) and if you see this bug then apply the patch in bug 12393 comment 54. (ii) In 21.08, CpuSpecList does not confine slurmd to those CPUs. SlurmdOffSpec is also broken. We are tracking this in an internal bug (12477). However, slurmd is lightweight, so you may not really notice a performance difference. In 21.08, with CpuSpecList configured, Slurm will still not allocate those CPUs to jobs. So that basic behavior still works. Does this answer your question? Do you have any follow-up questions? [1] https://slurm.schedmd.com/slurm.conf.html#OPT_CpuSpecList [2] https://slurm.schedmd.com/slurm.conf.html#OPT_SlurmdOffSpec
I have a typo - bug 12674 was supposed to bug 10674. But, I also now see that bug is private so I guess it's irrelevant to you anyway.
Hi, thank you for your answer. But do you know if "isolcpus" kernel parameter can be used as well? We need to have weka client use the CPU IDs that slurm has in CpuSpecList, I am unsure how to do this but that is of course unrelated to slurm :) I will try to get answers from Weka support about that. I see that you mention 21.08 a couple of times in your answer, but we have not started using 21.08, we are using 20.11.8. Do I only need to configure CpuSpecList and not SlurmdOffSpec ? If we have two sockets how do we know what the CPU ID is if there are f.e. 2x24core CPUs in our system.
(In reply to Hjalti Sveinsson from comment #8) > Hi, thank you for your answer. > > But do you know if "isolcpus" kernel parameter can be used as well? I have never heard of isolcpus. For Slurm, I recommend using the built-in tools (CpuSpecList and CoreSpecCount). > We need to have weka client use the CPU IDs that slurm has in CpuSpecList, I > am unsure how to do this but that is of course unrelated to slurm :) I will > try to get answers from Weka support about that. I have no idea. I agree with getting support from Weka, or see if you can find an answer with Google search. > I see that you mention 21.08 a couple of times in your answer, but we have > not started using 21.08, we are using 20.11.8. I know, I just wanted you to know some issues that we found in 21.08 if you decide to upgrade at some point. > Do I only need to configure CpuSpecList and not SlurmdOffSpec ? CpuSpecList is the important part. SlurmdOffSpec is optional, and it depends how you want it to work - do you want slurmd/slurmstepd running on the CPUs that are allocated to Slurm jobs or on the CPUs that are running Weka? Set SlurmdOffSpec accordingly. > If we have two sockets how do we know what the CPU ID is if there are f.e. > 2x24core CPUs in our system. lstopo is a good tool. But personally, I just use experimentation. I set CpuSpecList, start Slurm, then run a job using all the remaining CPUs on a node and see which ones are missing. For example, on my box with 8 cores and 2 threads per core: Without CpuSpecList: $ srun -n16 --cpu-bind=verbose sleep 1 |sort cpu-bind=MASK - n1-1, task 0 0 [5954]: mask 0x1 set cpu-bind=MASK - n1-1, task 1 1 [5955]: mask 0x100 set cpu-bind=MASK - n1-1, task 2 2 [5956]: mask 0x2 set cpu-bind=MASK - n1-1, task 3 3 [5957]: mask 0x200 set cpu-bind=MASK - n1-1, task 4 4 [5958]: mask 0x4 set cpu-bind=MASK - n1-1, task 5 5 [5959]: mask 0x400 set cpu-bind=MASK - n1-1, task 6 6 [5960]: mask 0x8 set cpu-bind=MASK - n1-1, task 7 7 [5961]: mask 0x800 set cpu-bind=MASK - n1-1, task 8 8 [5962]: mask 0x10 set cpu-bind=MASK - n1-1, task 9 9 [5963]: mask 0x1000 set cpu-bind=MASK - n1-1, task 10 10 [5964]: mask 0x20 set cpu-bind=MASK - n1-1, task 11 11 [5965]: mask 0x2000 set cpu-bind=MASK - n1-1, task 12 12 [5966]: mask 0x40 set cpu-bind=MASK - n1-1, task 13 13 [5967]: mask 0x4000 set cpu-bind=MASK - n1-1, task 14 14 [5968]: mask 0x80 set cpu-bind=MASK - n1-1, task 15 15 [5969]: mask 0x8000 set With CpuSpecList: # slurm.conf NodeName=<name> ... CpuSpecList=0-1 $ srun -n14 --cpu-bind=verbose sleep 1 cpu-bind=MASK - n1-1, task 0 0 [5317]: mask 0x2 set cpu-bind=MASK - n1-1, task 1 1 [5318]: mask 0x200 set cpu-bind=MASK - n1-1, task 2 2 [5319]: mask 0x4 set cpu-bind=MASK - n1-1, task 3 3 [5320]: mask 0x400 set cpu-bind=MASK - n1-1, task 4 4 [5321]: mask 0x8 set cpu-bind=MASK - n1-1, task 5 5 [5322]: mask 0x800 set cpu-bind=MASK - n1-1, task 6 6 [5323]: mask 0x10 set cpu-bind=MASK - n1-1, task 7 7 [5324]: mask 0x1000 set cpu-bind=MASK - n1-1, task 8 8 [5325]: mask 0x20 set cpu-bind=MASK - n1-1, task 9 9 [5326]: mask 0x2000 set cpu-bind=MASK - n1-1, task 10 10 [5327]: mask 0x40 set cpu-bind=MASK - n1-1, task 11 11 [5328]: mask 0x4000 set cpu-bind=MASK - n1-1, task 12 12 [5329]: mask 0x80 set cpu-bind=MASK - n1-1, task 13 13 [5330]: mask 0x8000 set
Is there anything else I can help with for this ticket?
Hi, you can close this issue, I think I have my answer, already tested this and it works like I expected. Thank you!
Sounds good. Closing as infogiven.