Created attachment 3016 [details] Slurm.conf We are testing the task affinity on our Cray XC30 (Ivy bridge node, 2 sockets per node, 12 cores per socket, 2 threads per core) with both task/affinity and task/cgroup plugins enabled (TaskPlugin = affinity,cgroup,cray). See our configuration file attached. We would like to have a good default setting so that most of the users just run with srun -n <# of tasks> ./a.out. One of the things we want to do is to make the --hint=nomultithread to default as most of the workload does not get benefit from using hyperthreading on our Cray XC30. Could you please let us know if there is any way we can set this to default for all jobs? In addition, the --ntasks-per-socket option still does not work, I wonder if you could point to us how we can fix it. Thanks, Zhengji
Hi Zhengji - You marked this as a "Sev 2 - High Impact" issue. Is this actively preventing jobs from running on your system, or would you mind changing this to a lower priority? > We are testing the task affinity on our Cray XC30 (Ivy bridge node, 2 > sockets per node, 12 cores per socket, 2 threads per core) with both > task/affinity and task/cgroup plugins enabled (TaskPlugin = > affinity,cgroup,cray). See our configuration file attached. We would like to > have a good default setting so that most of the users just run with srun -n > <# of tasks> ./a.out. One of the things we want to do is to make the > --hint=nomultithread to default as most of the workload does not get benefit > from using hyperthreading on our Cray XC30. Could you please let us know if > there is any way we can set this to default for all jobs? There's no way to set a default hint through the slurm.conf. You could potentially set an appropriate SLURM_HINT environment variable for your users which may accomplish what you want. There's a SelectTypeParameters option of CR_ONE_TASK_PER_CORE that may do what you're looking for. Please see http://slurm.schedmd.com/slurm.conf.html for some notes on how it works. > In addition, the --ntasks-per-socket option still does not work, I wonder if > you could point to us how we can fix it. Can you elaborate on "does not work" ? I don't see an obvious issue with it, and the regression suite does cover that option. One thing that can help highlight how the CPU affinity is being setup is --cpu_bind=verbose .
Dear Tim, Thanks very much for your prompt help. Sure it is OK to set this bug to a lower priority. I had different understanding about the "High Impact". We are working on what we should set as default on our systems in a short future, and it will affect all of our users, so I considered this bug as a high impact. I did not find something like "urgency level" in your bug system, which could be also a useful metric to calculate the priority of a ticket. (I thought this is not urgent, but has high impact). So you have answered my first two questions. Looks like the way to set some sbatch and srun default options is to set the corresponding environment variables. I will try those. By --ntasks-per-socket does not work, I mean zz217@nid00033:~/tests/affinity> srun -n 8 --ntasks-per-socket=4 --cpu_bind=cores xthi.intel Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) I wanted to make the first 4 tasks bind to first socket, and the rest 4 tasks bind to the second socket, but it does not do that. All the tasks were bound to the first socket. Here is the output of --cpu_bind=verbose, zz217@nid00033:~/tests/affinity> srun -n 8 --ntasks-per-socket=4 --cpu_bind=cores,verbose xthi.intel cpu_bind=MASK - nid00033, task 6 6 [41890]: mask 0x40000040 set cpu_bind=MASK - nid00033, task 5 5 [41889]: mask 0x20000020 set cpu_bind=MASK - nid00033, task 0 0 [41884]: mask 0x1000001 set cpu_bind=MASK - nid00033, task 3 3 [41887]: mask 0x8000008 set cpu_bind=MASK - nid00033, task 4 4 [41888]: mask 0x10000010 set cpu_bind=MASK - nid00033, task 2 2 [41886]: mask 0x4000004 set cpu_bind=MASK - nid00033, task 1 1 [41885]: mask 0x2000002 set cpu_bind=MASK - nid00033, task 7 7 [41891]: mask 0x80000080 set Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) We currently have (TaskPlugin = cgroup,cray) in our production systems, but we wanted to use the options that come with the task/affinity plugin, so we are testing (TaskPlugin = affinity,cgroup,cray) now. One of the options that we wanted to use is the --ntasks-per-socket. In addition, Currently we have SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK if we want to try out CR_ONE_TASK_PER_CORE as you suggested, what would be the SelectTypeParameters ? As you may already know we need support shared partition as well on our system (which may need CR_SOCKET_MEMORY). Thanks, Zhengji
I just tried to use SLURM_HINT=nomultithread but it seems not work as expected (if this works, I should not see the high number cores (>24) in the program output), while the command line option --hint=nomultithread seems to work only when the --hint option appears as the last option of the srun command line option. Could you please let me know what could be the issue? Thanks, Zhengji zz217@nid00033:~/tests/affinity> export SLURM_HINT=nomultithread; srun -n 8 --ntasks-per-socket=4 --cpu_bind=cores,verbose xthi.intel cpu_bind=MASK - nid00033, task 0 0 [43237]: mask 0x1000001 set cpu_bind=MASK - nid00033, task 3 3 [43240]: mask 0x8000008 set cpu_bind=MASK - nid00033, task 4 4 [43241]: mask 0x10000010 set cpu_bind=MASK - nid00033, task 6 6 [43243]: mask 0x40000040 set cpu_bind=MASK - nid00033, task 7 7 [43244]: mask 0x80000080 set cpu_bind=MASK - nid00033, task 2 2 [43239]: mask 0x4000004 set cpu_bind=MASK - nid00033, task 5 5 [43242]: mask 0x20000020 set cpu_bind=MASK - nid00033, task 1 1 [43238]: mask 0x2000002 set Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) #--hint=nomultithread does not work if appears infront of other srun options. zz217@nid00033:~/tests/affinity> unset SLURM_HINT; srun -n 8 --ntasks-per-socket=4 --hint=nomultithread --cpu_bind=cores,verbose xthi.intel cpu_bind=MASK - nid00033, task 7 7 [43331]: mask 0x80000080 set cpu_bind=MASK - nid00033, task 0 0 [43324]: mask 0x1000001 set cpu_bind=MASK - nid00033, task 1 1 [43325]: mask 0x2000002 set cpu_bind=MASK - nid00033, task 5 5 [43329]: mask 0x20000020 set cpu_bind=MASK - nid00033, task 2 2 [43326]: mask 0x4000004 set cpu_bind=MASK - nid00033, task 3 3 [43327]: mask 0x8000008 set cpu_bind=MASK - nid00033, task 6 6 [43330]: mask 0x40000040 set cpu_bind=MASK - nid00033, task 4 4 [43328]: mask 0x10000010 set Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) #--hint works if it appears in the end of the other srun options: zz217@nid00033:~/tests/affinity> unset SLURM_HINT; srun -n 8 --ntasks-per-socket=4 --cpu_bind=cores,verbose --hint=nomultithread xthi.intel cpu_bind=MASK - nid00033, task 2 2 [43508]: mask 0x4 set cpu_bind=MASK - nid00033, task 0 0 [43506]: mask 0x1 set cpu_bind=MASK - nid00033, task 4 4 [43510]: mask 0x10 set cpu_bind=MASK - nid00033, task 7 7 [43513]: mask 0x80 set cpu_bind=MASK - nid00033, task 6 6 [43512]: mask 0x40 set cpu_bind=MASK - nid00033, task 5 5 [43511]: mask 0x20 set cpu_bind=MASK - nid00033, task 3 3 [43509]: mask 0x8 set cpu_bind=MASK - nid00033, task 1 1 [43507]: mask 0x2 set Hello from rank 0, thread 0, on nid00033. (core affinity = 0) Hello from rank 1, thread 0, on nid00033. (core affinity = 1) Hello from rank 2, thread 0, on nid00033. (core affinity = 2) Hello from rank 3, thread 0, on nid00033. (core affinity = 3) Hello from rank 4, thread 0, on nid00033. (core affinity = 4) Hello from rank 5, thread 0, on nid00033. (core affinity = 5) Hello from rank 6, thread 0, on nid00033. (core affinity = 6) Hello from rank 7, thread 0, on nid00033. (core affinity = 7) zz217@nid00033:~/tests/affinity>
Dear Tim, Could you please let me know about the actions that SchedMD would like to take with the problem of the SLURM_HINT=nomultithread (I reported in the last update to this bug that this env failed to allow the srun command to use only physical cores)? It is very important for us to know if SchedMD will be fixing this bug or not soon, so we can decide our next step. I appreciate if you could update us at your earliest convenience. I would like to let you know that what we really need is the capability of enabling hyperthreading by demand only (e.g., using --hint=multithread on the srun command line). We hope the srun command works with the physical cores only by default (provided the hypreading is enabled in BIOS all the time). If you could make SLURM_HINT=nomultithread work for us so that we can use that env set our default, that would be great, but we are happy to pursue other approaches if available as well. Actually I am wondering if it is a good idea to add something like SbatchDefaultCommand into the Slurm config support so that we can use it to set the default srun command line for the batch jobs? Thanks, Zhengji
(In reply to Zhengji Zhao from comment #2) > Dear Tim, > > Thanks very much for your prompt help. Sure it is OK to set this bug to a > lower priority. I had different understanding about the "High Impact". We > are working on what we should set as default on our systems in a short > future, and it will affect all of our users, so I considered this bug as a > high impact. I did not find something like "urgency level" in your bug > system, which could be also a useful metric to calculate the priority of a > ticket. (I thought this is not urgent, but has high impact). Importance is something that can be set, but we don't actively sort based on that. We try to respond to everything promptly; but the Impact levels are tied to specific contractual obligations and have specific response times. > So you have answered my first two questions. Looks like the way to set some > sbatch and srun default options is to set the corresponding environment > variables. I will try those. > > By --ntasks-per-socket does not work, I mean > I wanted to make the first 4 tasks bind to first socket, and the rest 4 > tasks bind to the second socket, but it does not do that. All the tasks were > bound to the first socket. --ntasks-per-socket does not impact the layout, just the calculation of how many tasks to launch total; the task distribution is what impacts the layout. For block distribution this appears to be working correctly. Launching fewer tasks than the number of allocated cores does introduce some ambiguity into the result, although this is still working as defined. > We currently have (TaskPlugin = cgroup,cray) in our production systems, but > we wanted to use the options that come with the task/affinity plugin, so we > are testing (TaskPlugin = affinity,cgroup,cray) now. One of the options that > we wanted to use is the --ntasks-per-socket. Again, ntasks-per-socket does not directly influence affinity. > In addition, Currently we have > > SelectTypeParameters = > CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK > > if we want to try out CR_ONE_TASK_PER_CORE as you suggested, what would be > the SelectTypeParameters ? As you may already know we need support shared > partition as well on our system (which may need CR_SOCKET_MEMORY). It's an additional flag you can add: SelectTypeParameters=CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE
(In reply to Zhengji Zhao from comment #3) > I just tried to use SLURM_HINT=nomultithread but it seems not work as > expected (if this works, I should not see the high number cores (>24) in the > program output), while the command line option --hint=nomultithread seems to > work only when the --hint option appears as the last option of the srun > command line option. Could you please let me know what could be the issue? > > zz217@nid00033:~/tests/affinity> export SLURM_HINT=nomultithread; srun -n 8 > --ntasks-per-socket=4 --cpu_bind=cores,verbose xthi.intel > cpu_bind=MASK - nid00033, task 0 0 [43237]: mask 0x1000001 set > cpu_bind=MASK - nid00033, task 3 3 [43240]: mask 0x8000008 set > cpu_bind=MASK - nid00033, task 4 4 [43241]: mask 0x10000010 set > cpu_bind=MASK - nid00033, task 6 6 [43243]: mask 0x40000040 set > cpu_bind=MASK - nid00033, task 7 7 [43244]: mask 0x80000080 set > cpu_bind=MASK - nid00033, task 2 2 [43239]: mask 0x4000004 set > cpu_bind=MASK - nid00033, task 5 5 [43242]: mask 0x20000020 set > cpu_bind=MASK - nid00033, task 1 1 [43238]: mask 0x2000002 set > Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) > Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) > Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) > Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) > Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) > Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) > Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) > Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) > > #--hint=nomultithread does not work if appears infront of other srun options. > zz217@nid00033:~/tests/affinity> unset SLURM_HINT; srun -n 8 > --ntasks-per-socket=4 --hint=nomultithread --cpu_bind=cores,verbose > xthi.intel > cpu_bind=MASK - nid00033, task 7 7 [43331]: mask 0x80000080 set > cpu_bind=MASK - nid00033, task 0 0 [43324]: mask 0x1000001 set > cpu_bind=MASK - nid00033, task 1 1 [43325]: mask 0x2000002 set > cpu_bind=MASK - nid00033, task 5 5 [43329]: mask 0x20000020 set > cpu_bind=MASK - nid00033, task 2 2 [43326]: mask 0x4000004 set > cpu_bind=MASK - nid00033, task 3 3 [43327]: mask 0x8000008 set > cpu_bind=MASK - nid00033, task 6 6 [43330]: mask 0x40000040 set > cpu_bind=MASK - nid00033, task 4 4 [43328]: mask 0x10000010 set > Hello from rank 0, thread 0, on nid00033. (core affinity = 0,24) > Hello from rank 4, thread 0, on nid00033. (core affinity = 4,28) > Hello from rank 1, thread 0, on nid00033. (core affinity = 1,25) > Hello from rank 2, thread 0, on nid00033. (core affinity = 2,26) > Hello from rank 3, thread 0, on nid00033. (core affinity = 3,27) > Hello from rank 5, thread 0, on nid00033. (core affinity = 5,29) > Hello from rank 6, thread 0, on nid00033. (core affinity = 6,30) > Hello from rank 7, thread 0, on nid00033. (core affinity = 7,31) > > #--hint works if it appears in the end of the other srun options: > zz217@nid00033:~/tests/affinity> unset SLURM_HINT; srun -n 8 > --ntasks-per-socket=4 --cpu_bind=cores,verbose --hint=nomultithread > xthi.intel > cpu_bind=MASK - nid00033, task 2 2 [43508]: mask 0x4 set > cpu_bind=MASK - nid00033, task 0 0 [43506]: mask 0x1 set > cpu_bind=MASK - nid00033, task 4 4 [43510]: mask 0x10 set > cpu_bind=MASK - nid00033, task 7 7 [43513]: mask 0x80 set > cpu_bind=MASK - nid00033, task 6 6 [43512]: mask 0x40 set > cpu_bind=MASK - nid00033, task 5 5 [43511]: mask 0x20 set > cpu_bind=MASK - nid00033, task 3 3 [43509]: mask 0x8 set > cpu_bind=MASK - nid00033, task 1 1 [43507]: mask 0x2 set > Hello from rank 0, thread 0, on nid00033. (core affinity = 0) > Hello from rank 1, thread 0, on nid00033. (core affinity = 1) > Hello from rank 2, thread 0, on nid00033. (core affinity = 2) > Hello from rank 3, thread 0, on nid00033. (core affinity = 3) > Hello from rank 4, thread 0, on nid00033. (core affinity = 4) > Hello from rank 5, thread 0, on nid00033. (core affinity = 5) > Hello from rank 6, thread 0, on nid00033. (core affinity = 6) > Hello from rank 7, thread 0, on nid00033. (core affinity = 7) > zz217@nid00033:~/tests/affinity> Are all three of these commands being executed within an existing allocation, or are they making the allocation request separately? If within an allocation, can you provide the 'salloc' command used to acquire the resources? Settings from there are inherited by the srun command, and may explain some of the behavior seen here. (In reply to Zhengji Zhao from comment #4) > Dear Tim, > > Could you please let me know about the actions that SchedMD would like to > take with the problem of the SLURM_HINT=nomultithread (I reported in the > last update to this bug that this env failed to allow the srun command to > use only physical cores)? It is very important for us to know if SchedMD > will be fixing this bug or not soon, so we can decide our next step. I > appreciate if you could update us at your earliest convenience. I'm still deciphering some of the output you sent; something does appear to be working oddly at least on my test system and I need to work through whether its intentional behavior or a bug. > I would like to let you know that what we really need is the capability of > enabling hyperthreading by demand only (e.g., using --hint=multithread on > the srun command line). We hope the srun command works with the physical > cores only by default (provided the hypreading is enabled in BIOS all the > time). If you could make SLURM_HINT=nomultithread work for us so that we can > use that env set our default, that would be great, but we are happy to > pursue other approaches if available as well. Actually I am wondering if it > is a good idea to add something like SbatchDefaultCommand into the Slurm > config support so that we can use it to set the default srun command line > for the batch jobs? We're unlikely to pursu set elsewhere and the interaction between various settings with such command would further confuse things. I believe CR_ONE_TASK_PER_CORE gets you most of what you're after, although I'm still working through the examples you've sent to try to understand if there's a bug there or I'm misinterpreting the output.
Hi Tim, Thanks for getting back to me promptly. I am attaching the code, xthi.c (compile like this with an intel compiler: mpiicc -openmp xthi.c), which I used to print out the CPU bindings in my tests in case helpful. It is a code provided by Cray, and we have been using it to test the CPU affinity on our Cray systems, as its outputs are easier to read than reading the CPU masks or directly look into the /proc/self/status file. I will test CR_ONE_TASK_PER_CORE and will update you. Meanwhile if you have any update about the SLURM_HINT, please let me know. Thanks, Zhengji
Created attachment 3025 [details] xthi.c file
I am able to achieve the desired behaviour by adding the SelectTypeParameters value of "CR_ONE_TASK_PER_CORE" to those you already have set (as Tim suggested in comment #6) PLUS either setting the environment variable SLURM_CPU_BIND=cores or using the job option --cpu_bind=cores. Given that environment, if a user wants to run one task per thread, he would need to use the following two options: --cpu_bind=thread --ntasks-per-core=2 What I would like to propose for our next release (16.05, due out in May) is if the cluster is configured with CR_ONE_TASK_PER_CORE and the user does not specify the --ntasks-per-core=# with a value larger than 1, then by default bind tasks to cores. That would eliminate the need for the SLURM_CPU_BIND=cores environment variable or--cpu_bind=cores option. In order to bind to threads, the user would only need to specify --ntasks-per-core=2 (the --cpu_bind=thread would become the default). Does that sound acceptable?
We've come up with what I believe is a better solution. The first part, which you can do with Slurm version 15.08, is to configure CR_ONE_TASK_PER_CORE as described in prior comments. The second part required making changes to RPCs and sending the ntasks-per-socket information to the task binding plugin. The task affinity plugin was modified to support the --ntasks-per-socket option (that information isn't available to the plugin in Slurm version 15.08). The commit with that change is here: https://github.com/SchedMD/slurm/commit/31aa3244b55bcf6fafe8d76a2c3b8047afeac6e3 Finally you should be aware that if the tasks to be launched can be "nicely" mapped onto the allocated resources, all tasks by default get bound to all allocated resources. For example, if a job is allocated an entire node or socket and wants to launch 2 tasks then each task would get bound to one of the sockets on a 2 socket node. If there are 3 tasks to be launched on that same 2 socket node, then each task can access all threads by default. There is a --cpu_bind option to override the default by binding tasks to cores for example, even it that leaves cores idle. Here are some logs demonstrating some of this on a node with 2 sockets, 6 cores per socket, and 2 threads per core: $ srun --cpu_bind=verbose -n4 --ntasks-per-socket=2 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [24774]: mask 0x555555 set cpu_bind=MASK - smd-server, task 1 1 [24775]: mask 0x555555 set cpu_bind=MASK - smd-server, task 3 3 [24777]: mask 0xaaaaaa set cpu_bind=MASK - smd-server, task 2 2 [24776]: mask 0xaaaaaa set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose -n4 --ntasks-per-socket=3 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [24790]: mask 0x555555 set cpu_bind=MASK - smd-server, task 1 1 [24791]: mask 0x555555 set cpu_bind=MASK - smd-server, task 2 2 [24792]: mask 0x555555 set cpu_bind=MASK - smd-server, task 3 3 [24793]: mask 0xaaaaaa set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose -n4 --ntasks-per-socket=4 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [24816]: mask 0x555555 set cpu_bind=MASK - smd-server, task 1 1 [24817]: mask 0x555555 set cpu_bind=MASK - smd-server, task 2 2 [24818]: mask 0x555555 set cpu_bind=MASK - smd-server, task 3 3 [24819]: mask 0x555555 set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose,core -n4 --ntasks-per-socket=4 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [24847]: mask 0x1001 set cpu_bind=MASK - smd-server, task 1 1 [24848]: mask 0x4004 set cpu_bind=MASK - smd-server, task 2 2 [24849]: mask 0x10010 set cpu_bind=MASK - smd-server, task 3 3 [24850]: mask 0x40040 set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose,core -n4 --ntasks-per-socket=3 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [24873]: mask 0x1001 set cpu_bind=MASK - smd-server, task 1 1 [24874]: mask 0x4004 set cpu_bind=MASK - smd-server, task 2 2 [24875]: mask 0x10010 set cpu_bind=MASK - smd-server, task 3 3 [24876]: mask 0x2002 set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose,thread -n4 --ntasks-per-socket=3 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [25039]: mask 0x1 set cpu_bind=MASK - smd-server, task 1 1 [25040]: mask 0x4 set cpu_bind=MASK - smd-server, task 2 2 [25041]: mask 0x10 set cpu_bind=MASK - smd-server, task 3 3 [25042]: mask 0x2 set smd-server smd-server smd-server smd-server $ srun --cpu_bind=verbose,thread -n4 --ntasks-per-socket=2 -m block:block hostname cpu_bind=MASK - smd-server, task 0 0 [25077]: mask 0x1 set cpu_bind=MASK - smd-server, task 1 1 [25078]: mask 0x4 set cpu_bind=MASK - smd-server, task 2 2 [25079]: mask 0x2 set cpu_bind=MASK - smd-server, task 3 3 [25080]: mask 0x8 set smd-server smd-server smd-server smd-server
Thanks a lot for the comment. I will reply to your comments 17 shortly. My update was crashed for some reason, this is what I wanted to send to respond to your comment 16. Dear Moe, Yes, this sounds great! Your proposed change for next Slurm release appears to be what we want exactly for our Cray XC30 (Ivy Bridge, 2 threads/core), and Cray XC40 (Haswell, 2 threads/core). For our next Cray XC40 (KNL nodes, 4 threads/per core), it is possible that we may want the default to be 4 threads/core. So I hope to make sure that in the next May release (with your proposed change in place) we will still be able to set the threads to be the default and just use --ntasks-per-core=1 to request not to use the hyperthreads (so to just work with cores). Our goal is to make the default easy for most of the workload/users (as easy as just doing srun –n #tasks ./a.out), meanwhile the non-default is not that inconvenient (e.g., just need to use one extra flag/option at most). In addition, the default setting should not restrict/fail us from doing extra/complicated task/thread/memory bindings beyond the default. *********** Just to confirm, the following config/setting (only the relevant config/setting) was what you used to achieve what I want now: TaskPlugin = affinity,cgroup,cray SelectTypeParameters=CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE export SLURM_CPU_BIND=cores (or srun --cpu_bind=cores ...) for now but in the May release this will be removed. I will test this setting with our current Slurm install and will reproduce the cpu bndings you observed first, and will update you. Just to give you an heads-up, we may need your help with the memory binding as well, which is very important for us to be able to work with the high bandwidth memory on KNL. Thanks a lot! Zhengji
Just notice I was addressing your comment 13...
(In reply to Zhengji Zhao from comment #17) > Thanks a lot for the comment. I will reply to your comments 17 shortly. > > Just to confirm, the following config/setting (only the relevant > config/setting) was what you used to achieve what I want now: > > TaskPlugin = affinity,cgroup,cray > > SelectTypeParameters=CR_SOCKET_MEMORY,OTHER_CONS_RES, > CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE > > export SLURM_CPU_BIND=cores (or srun --cpu_bind=cores ...) for now but in > the May release this will be removed. This is correct. I did just make another change to Slurm version 16.05 to support this. Previously the TaskPluginParams would specify the one and only task binding that the system would support. I changed that so the configuration parameter specified the default task binding. It will only be used if the user fails to specify the binding and any user CPU binding option will now override the configuration parameter rather than generate an error. This additional change is here: https://github.com/SchedMD/slurm/commit/a01e6562edc1040bc3cee37fd96cade269b12ff4 We plan to tag a pre-release of Slurm version 16.5 on Thursday if you have a test system to work with.
(In reply to Comment 19) Yes, we have a test system, so we can test the pre-release of Slurm 16.5. Please let us know how to get it (perhaps our system admin, Doug know the place to download already, just in case). I am looking forward to testing your new changes, which appear to be exactly what I asked for when I opened this bug! I will get back to you after testing (it may take some days). (In reply to Comment 16) It is great to see that the change you have made makes the --ntasks-per-socket work as I wanted! This will definitely meet our needs on Ivey Bridge, and Haswell systems. However, for KNL nodes, we need to be able to achieve the same or similar control over the numa nodes where the number of sockets does not equal to the number of numa nodes, i.e., on a KNL node, there is only a single socket but multiple numa domains/nodes. Just to give you a heads-up, I am attaching two files, which contain the numactl --hardware command output for the two (flat) memory configurations (Qudrant and Sub NUMA Cluster (SNC) modes) on KNL nodes. I hope we can do --ntasks-per-numanode to control the number of tasks bound to each numa node and also can bind memory to the numa node of choice. Note that the HBM appears as multiple numa nodes (SNC mode) or a single numa node (in Quadrant mode). Thanks, Zhengji
Created attachment 3036 [details] Numactl --hardware output for the Quadrant Flat memory configuration on a KNL node
Created attachment 3037 [details] Numactl --hardware output for the SNC Flat memory mode on a KNL node
(In reply to Zhengji Zhao from comment #20) > (In reply to Comment 19) > > Yes, we have a test system, so we can test the pre-release of Slurm 16.5. > Please let us know how to get it (perhaps our system admin, Doug know the > place to download already, just in case). I am looking forward to testing > your new changes, which appear to be exactly what I asked for when I opened > this bug! I will get back to you after testing (it may take some days). See: http://www.schedmd.com/#repos Then look in the second section: "Download the latest development version of Slurm" > (In reply to Comment 16) > It is great to see that the change you have made makes the > --ntasks-per-socket work as I wanted! This will definitely meet our needs on > Ivey Bridge, and Haswell systems. > > However, for KNL nodes, we need to be able to achieve the same or similar > control over the numa nodes where the number of sockets does not equal to > the number of numa nodes, i.e., on a KNL node, there is only a single socket > but multiple numa domains/nodes. Just to give you a heads-up, I am attaching > two files, which contain the numactl --hardware command output for the two > (flat) memory configurations (Qudrant and Sub NUMA Cluster (SNC) modes) on > KNL nodes. I hope we can do --ntasks-per-numanode to control the number of > tasks bound to each numa node and also can bind memory to the numa node of > choice. Note that the HBM appears as multiple numa nodes (SNC mode) or a > single numa node (in Quadrant mode). I would recommend opening a separate ticket for KNL as there is more development work required. Slurm manages processor allocation layouts at the level of baseboards, NUMA, sockets, cores, and threads. Slurm is also dependent upon the number of cores per NUMA node being uniform within a single node, which is not the case of KNL quad mode. Slurm lack a concept of the core-pairs as on a KNL. Slurm also lacks a --ntasks-per-numanode option. Slurm is recording the KNL NUMA nodes as sockets, which seems to work best for right now. It's a work in progress...
Thanks a lot for the link. We will test it on our test system. Yes, it makes sense to open a separate ticket for KNL.I will open a new ticket once our need/requirement for the memory binding (along with CPU binding) becomes more solid and specific. We have just got the access to early KNL nodes, we should be able to gain some experience soon. It sounds great that Slurm treats the KNL NUMA nodes as sockets. Thanks, Zhengji
(In reply to Zhengji Zhao from comment #24) > Thanks a lot for the link. We will test it on our test system. Do you have any updated information on this?
Thanks a lot for following up, I really appreciate it. Our system admin has been completely overbooked by many other duties recently (also conferences, etc), so that I have been waiting for him to install the new version on our test system. Once he is back to his regular work (he is still out of town), we can resume the testing. Now we have some experience on the KNL white nodes as well, so I will soon get back to you with more specific idea about what we want to with the task/thread/memory affinity. Thanks, Zhengji On Wed, May 18, 2016 at 9:48 AM, <bugs@schedmd.com> wrote: > *Comment # 25 <https://bugs.schedmd.com/show_bug.cgi?id=2655#c25> on bug > 2655 <https://bugs.schedmd.com/show_bug.cgi?id=2655> from Moe Jette > <jette@schedmd.com> * > > (In reply to Zhengji Zhao from comment #24 <https://bugs.schedmd.com/show_bug.cgi?id=2655#c24>)> Thanks a lot for the link. We will test it on our test system. > > Do you have any updated information on this? > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
Do you have any updates on this ticket?
Thanks for checking on this. I was on vacation last two weeks. I will check the status, and will get back to you as soon as I can. Thanks again, I really appreciate your help with this. Zhengji On Thu, Jun 2, 2016 at 12:42 PM, <bugs@schedmd.com> wrote: > *Comment # 27 <https://bugs.schedmd.com/show_bug.cgi?id=2655#c27> on bug > 2655 <https://bugs.schedmd.com/show_bug.cgi?id=2655> from Moe Jette > <jette@schedmd.com> * > > Do you have any updates on this ticket? > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
(In reply to Zhengji Zhao from comment #28) > Thanks for checking on this. I was on vacation last two weeks. I will check > the status, and will get back to you as soon as I can. Any update on this?
I am really sorry for the long delay in getting back to you and really appreciate you for following up with us on this bug. Unfortunately, since we have to keep our XC30 test system (called Alva, we have been doing our task affinity testing on Alva) to stay in sync with our production system, Edison, (per our system admin Doug Jacobson, I suppose it was to support some on-going system upgrade) for the moment, so we were not able to experiment with the new Slurm version yet. I think our test has to wait until our test system for Cori (called Gerty) to be back with an upgraded CLE version (Rhine/RedWood). I am gathering the requirement from our application readiness team members (who are working on the KNL white boxes) about their task/thread/memory affinity need now. We are basically using the KMP_AFFINITY to control the affinity for now on single KNL white box now. At a very high level, we hope to have srun to manage the task/thread/memory affinity without needing to use the KMP_AFFINITY on our Cori KNL system later. I believe we will need srun to support all KMP_AFFINITY options (compact, scatter, balanced, none, explicit). While we use the srun to control the task/thread/memory affinity, we still hope to be able to use KMP_AFFINITY if we want to. I will get back to you with further updates. Zhengji On Thu, Jun 23, 2016 at 10:35 AM, <bugs@schedmd.com> wrote: > *Comment # 29 <https://bugs.schedmd.com/show_bug.cgi?id=2655#c29> on bug > 2655 <https://bugs.schedmd.com/show_bug.cgi?id=2655> from Moe Jette > <jette@schedmd.com> * > > (In reply to Zhengji Zhao from comment #28 <https://bugs.schedmd.com/show_bug.cgi?id=2655#c28>)> Thanks for checking on this. I was on vacation last two weeks. I will check > > the status, and will get back to you as soon as I can. > > Any update on this? > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
(In reply to Zhengji Zhao from comment #30) > I am really sorry for the long delay in getting back to you and really > appreciate you for following up with us on this bug. No problem. I'd rather be waiting for you than the other way around. > I am gathering the requirement from our application readiness team members > (who are working on the KNL white boxes) about their task/thread/memory > affinity need now. We are basically using the KMP_AFFINITY to control the > affinity for now on single KNL white box now. At a very high level, we hope > to have srun to manage the task/thread/memory affinity without needing to > use the KMP_AFFINITY on our Cori KNL system later. I believe we will need > srun to support all KMP_AFFINITY options (compact, scatter, balanced, none, > explicit). While we use the srun to control the task/thread/memory > affinity, we still hope to be able to use KMP_AFFINITY if we want to. Slurm's -m/--distribution option supports all of these options and more. It can be controlled via the command line option or environment variables. The user can specify task distribution options of cyclic, block, fcyclic (cyclic task IDs, filling the resource), plus blocks of user controlled sizes (plane). These options are available to control layout at the node, socket/NUMA, core, and thread levels. More information here: http://slurm.schedmd.com/mc_support.html
Created attachment 3402 [details] Slurm configuration file
Dear Moe, We finally have a test system that is set up with Slurm 16.05 with task/affinity enabled and have your suggested SelectTypeParameters, CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE See the slurm.conf file attached at 2016-08-10 15:57 PDT. Immediately, I ran into two problems on our dual socket 16 core Haswell nodes (32 cores in total, 64 logical cores or threads (CPUs) in total). 1) when running with only 32 tasks per node (i.e., not using hyperthreads) it binds a single task to two CPUs from two different physical cores, while we hope it binds to two CPUs that belong to the same physical core. This is an demonstration of the problem: srun --cpu_bind=verbose,cores --mem_bind=verbose,local -n32 ./xthi.intel 2>&1 |sort -nk4,6 Hello from rank 0 thread 0 on nid00021 (core affinity = 0,1) cpu_bind=MASK - nid00021, task 0 0 [34296]: mask 0x3 set cpu_bind=MASK - nid00021, task 1 1 [34297]: mask 0xc set cpu_bind=MASK - nid00021, task 2 2 [34298]: mask 0x30 set cpu_bind=MASK - nid00021, task 3 3 [34299]: mask 0xc0 set cpu_bind=MASK - nid00021, task 4 4 [34300]: mask 0x300 set cpu_bind=MASK - nid00021, task 5 5 [34301]: mask 0xc00 set cpu_bind=MASK - nid00021, task 6 6 [34302]: mask 0x3000 set cpu_bind=MASK - nid00021, task 7 7 [34303]: mask 0xc000 set cpu_bind=MASK - nid00021, task 8 8 [34304]: mask 0x30000 set cpu_bind=MASK - nid00021, task 9 9 [34305]: mask 0xc0000 set cpu_bind=MASK - nid00021, task 10 10 [34306]: mask 0x300000 set cpu_bind=MASK - nid00021, task 11 11 [34307]: mask 0xc00000 set cpu_bind=MASK - nid00021, task 12 12 [34308]: mask 0x3000000 set cpu_bind=MASK - nid00021, task 13 13 [34309]: mask 0xc000000 set cpu_bind=MASK - nid00021, task 14 14 [34310]: mask 0x30000000 set cpu_bind=MASK - nid00021, task 15 15 [34311]: mask 0xc0000000 set cpu_bind=MASK - nid00021, task 16 16 [34312]: mask 0x300000000 set cpu_bind=MASK - nid00021, task 17 17 [34313]: mask 0xc00000000 set cpu_bind=MASK - nid00021, task 18 18 [34314]: mask 0x3000000000 set cpu_bind=MASK - nid00021, task 19 19 [34315]: mask 0xc000000000 set cpu_bind=MASK - nid00021, task 20 20 [34316]: mask 0x30000000000 set cpu_bind=MASK - nid00021, task 21 21 [34317]: mask 0xc0000000000 set cpu_bind=MASK - nid00021, task 22 22 [34318]: mask 0x300000000000 set cpu_bind=MASK - nid00021, task 23 23 [34319]: mask 0xc00000000000 set cpu_bind=MASK - nid00021, task 24 24 [34320]: mask 0x3000000000000 set cpu_bind=MASK - nid00021, task 25 25 [34321]: mask 0xc000000000000 set cpu_bind=MASK - nid00021, task 26 26 [34322]: mask 0x30000000000000 set cpu_bind=MASK - nid00021, task 27 27 [34323]: mask 0xc0000000000000 set cpu_bind=MASK - nid00021, task 28 28 [34324]: mask 0x300000000000000 set cpu_bind=MASK - nid00021, task 29 29 [34325]: mask 0xc00000000000000 set cpu_bind=MASK - nid00021, task 30 30 [34326]: mask 0x3000000000000000 set cpu_bind=MASK - nid00021, task 31 31 [34327]: mask 0xc000000000000000 set mem_bind=LOC - nid00021, task 0 0 [34296]: mask 0x1 set mem_bind=LOC - nid00021, task 1 1 [34297]: mask 0x1 set mem_bind=LOC - nid00021, task 2 2 [34298]: mask 0x1 set mem_bind=LOC - nid00021, task 3 3 [34299]: mask 0x1 set mem_bind=LOC - nid00021, task 4 4 [34300]: mask 0x1 set mem_bind=LOC - nid00021, task 5 5 [34301]: mask 0x1 set mem_bind=LOC - nid00021, task 6 6 [34302]: mask 0x1 set mem_bind=LOC - nid00021, task 7 7 [34303]: mask 0x1 set mem_bind=LOC - nid00021, task 8 8 [34304]: mask 0x2 set mem_bind=LOC - nid00021, task 9 9 [34305]: mask 0x2 set mem_bind=LOC - nid00021, task 10 10 [34306]: mask 0x2 set mem_bind=LOC - nid00021, task 11 11 [34307]: mask 0x2 set mem_bind=LOC - nid00021, task 12 12 [34308]: mask 0x2 set mem_bind=LOC - nid00021, task 13 13 [34309]: mask 0x2 set mem_bind=LOC - nid00021, task 14 14 [34310]: mask 0x2 set mem_bind=LOC - nid00021, task 15 15 [34311]: mask 0x2 set mem_bind=LOC - nid00021, task 16 16 [34312]: mask 0x1 set mem_bind=LOC - nid00021, task 17 17 [34313]: mask 0x1 set mem_bind=LOC - nid00021, task 18 18 [34314]: mask 0x1 set mem_bind=LOC - nid00021, task 19 19 [34315]: mask 0x1 set mem_bind=LOC - nid00021, task 20 20 [34316]: mask 0x1 set mem_bind=LOC - nid00021, task 21 21 [34317]: mask 0x1 set mem_bind=LOC - nid00021, task 22 22 [34318]: mask 0x1 set mem_bind=LOC - nid00021, task 23 23 [34319]: mask 0x1 set mem_bind=LOC - nid00021, task 24 24 [34320]: mask 0x2 set mem_bind=LOC - nid00021, task 25 25 [34321]: mask 0x2 set mem_bind=LOC - nid00021, task 26 26 [34322]: mask 0x2 set mem_bind=LOC - nid00021, task 27 27 [34323]: mask 0x2 set mem_bind=LOC - nid00021, task 28 28 [34324]: mask 0x2 set mem_bind=LOC - nid00021, task 29 29 [34325]: mask 0x2 set mem_bind=LOC - nid00021, task 30 30 [34326]: mask 0x2 set mem_bind=LOC - nid00021, task 31 31 [34327]: mask 0x2 set Hello from rank 1 thread 0 on nid00021 (core affinity = 2,3) Hello from rank 2 thread 0 on nid00021 (core affinity = 4,5) Hello from rank 3 thread 0 on nid00021 (core affinity = 6,7) Hello from rank 4 thread 0 on nid00021 (core affinity = 8,9) Hello from rank 5 thread 0 on nid00021 (core affinity = 10,11) Hello from rank 6 thread 0 on nid00021 (core affinity = 12,13) Hello from rank 7 thread 0 on nid00021 (core affinity = 14,15) Hello from rank 8 thread 0 on nid00021 (core affinity = 16,17) Hello from rank 9 thread 0 on nid00021 (core affinity = 18,19) Hello from rank 10 thread 0 on nid00021 (core affinity = 20,21) Hello from rank 11 thread 0 on nid00021 (core affinity = 22,23) Hello from rank 12 thread 0 on nid00021 (core affinity = 24,25) Hello from rank 13 thread 0 on nid00021 (core affinity = 26,27) Hello from rank 14 thread 0 on nid00021 (core affinity = 28,29) Hello from rank 15 thread 0 on nid00021 (core affinity = 30,31) Hello from rank 16 thread 0 on nid00021 (core affinity = 32,33) Hello from rank 17 thread 0 on nid00021 (core affinity = 34,35) Hello from rank 18 thread 0 on nid00021 (core affinity = 36,37) Hello from rank 19 thread 0 on nid00021 (core affinity = 38,39) Hello from rank 20 thread 0 on nid00021 (core affinity = 40,41) Hello from rank 21 thread 0 on nid00021 (core affinity = 42,43) Hello from rank 22 thread 0 on nid00021 (core affinity = 44,45) Hello from rank 23 thread 0 on nid00021 (core affinity = 46,47) Hello from rank 24 thread 0 on nid00021 (core affinity = 48,49) Hello from rank 25 thread 0 on nid00021 (core affinity = 50,51) Hello from rank 26 thread 0 on nid00021 (core affinity = 52,53) Hello from rank 27 thread 0 on nid00021 (core affinity = 54,55) Hello from rank 28 thread 0 on nid00021 (core affinity = 56,57) Hello from rank 29 thread 0 on nid00021 (core affinity = 58,59) Hello from rank 30 thread 0 on nid00021 (core affinity = 60,61) Hello from rank 31 thread 0 on nid00021 (core affinity = 62,63) We need the bind looks like this, Hello from rank 0 thread 0 on nid00021 (core affinity = 0,32) Hello from rank 1 thread 0 on nid00021 (core affinity = 1,33) Hello from rank 2 thread 0 on nid00021 (core affinity = 2,34) Hello from rank 3 thread 0 on nid00021 (core affinity = 3,35) ... instead of the above Hello from rank 0 thread 0 on nid00021 (core affinity = 0,1) Hello from rank 1 thread 0 on nid00021 (core affinity = 2,3) Hello from rank 2 thread 0 on nid00021 (core affinity = 4,5) Hello from rank 3 thread 0 on nid00021 (core affinity = 6,7) 2) It seems now I can not run with hyperthreads anymore, the following srun commands all return error, "More processors requested than permitted", srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel srun: error: Unable to create job step: More processors requested than permitted srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel srun: error: Unable to create job step: More processors requested than permitted srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 --hint=multithread ./xthi.intel srun: error: Unable to create job step: More processors requested than permitted Could you please take a look? I appreciate any advice that help us to fix these two problems. Thanks, Zhengji
The first problem (not binding to the proper threads) should be fixed inn the following commit from a few days ago: https://github.com/SchedMD/slurm/commit/f36c4ee53763689c822ad524fa7b3f1853f5f9e6 This bug fix will be in Slurm version 16.05.4, which we plan to release tomorrow. I will investigate the second problem soon.
Thanks so much for the quick reply! I am glad the first problem has already had the fix. I will ask our system amdin to install it. In case you prefer the scontrol show config output (I did not see the SelectTypeParameters in the slurm.conf file) I am including it here. Looking forward to hearing from you soon. Zhengji zz217@gert01:~/affinity/hsw> scontrol show config Configuration data as of 2016-08-10T16:14:31 AccountingStorageBackupHost = gert01-144 AccountingStorageEnforce = associations,limits,qos,safe AccountingStorageHost = gertque01-144 AccountingStorageLoc = N/A AccountingStoragePort = 6819 AccountingStorageTRES = cpu,mem,energy,node,bb/cray AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreJobComment = Yes AcctGatherEnergyType = acct_gather_energy/cray AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType = acct_gather_infiniband/none AcctGatherNodeFreq = 0 sec AcctGatherProfileType = acct_gather_profile/none AllowSpecResourcesUsage = 1 AuthInfo = (null) AuthType = auth/munge BackupAddr = 128.55.144.82 BackupController = gertque01 BatchStartTimeout = 10 sec BOOT_TIME = 2016-08-10T11:40:38 BurstBufferType = burst_buffer/cray CacheGroups = 0 CheckpointType = checkpoint/none ChosLoc = (null) ClusterName = gerty CompleteWait = 300 sec ControlAddr = ctlnet1 ControlMachine = ctlnet1 CoreSpecPlugin = core_spec/cray CpuFreqDef = Unknown CpuFreqGovernors = Performance,OnDemand CryptoType = crypto/munge DebugFlags = Backfill,BurstBuffer DefMemPerNode = UNLIMITED DisableRootJobs = Yes EioTimeout = 60 EnforcePartLimits = ANY Epilog = (null) EpilogMsgTime = 2000 usec EpilogSlurmctld = (null) ExtSensorsType = ext_sensors/none ExtSensorsFreq = 0 sec FairShareDampeningFactor = 1 FastSchedule = 1 FirstJobId = 1 GetEnvTimeout = 2 sec GresTypes = craynetwork,hbm GroupUpdateForce = 1 GroupUpdateTime = 600 sec HASH_VAL = Different Ours=0x6d885c88 Slurmctld=0xeb6d56cd HealthCheckInterval = 0 sec HealthCheckNodeState = ANY HealthCheckProgram = (null) InactiveLimit = 600 sec JobAcctGatherFrequency = 0 JobAcctGatherType = jobacct_gather/cgroup JobAcctGatherParams = (null) JobCheckpointDir = /var/slurm/checkpoint JobCompHost = localhost JobCompLoc = /var/log/slurm_jobcomp.log JobCompPort = 0 JobCompType = jobcomp/none JobCompUser = root JobContainerType = job_container/cncu JobCredentialPrivateKey = (null) JobCredentialPublicCertificate = (null) JobFileAppend = 0 JobRequeue = 0 JobSubmitPlugins = cray,lua KeepAliveTime = SYSTEM_DEFAULT KillOnBadExit = 1 KillWait = 30 sec LaunchParameters = (null) LaunchType = launch/slurm Layouts = Licenses = SCRATCH:1000000,gscratch1:1000000,project:1000000,projecta:1000000,projectb:1000000,dna:1000000 LicensesUsed = dna:0/1000000,projectb:0/1000000,projecta:0/1000000,project:0/1000000,gscratch1:0/1000000,SCRATCH:0/1000000 MailProg = /bin/mail MaxArraySize = 65000 MaxJobCount = 500000 MaxJobId = 2147418112 MaxMemPerNode = UNLIMITED MaxStepCount = 40000 MaxTasksPerNode = 512 MCSPlugin = mcs/none MCSParameters = (null) MemLimitEnforce = Yes MessageTimeout = 60 sec MinJobAge = 300 sec MpiDefault = openmpi MpiParams = ports=63001-64000 MsgAggregationParams = (null) NEXT_JOB_ID = 187 NodeFeaturesPlugins = knl_cray OverTimeLimit = 0 min PluginDir = /usr/lib64/slurm PlugStackConfig = /etc/slurm/plugstack.conf PowerParameters = (null) PowerPlugin = PreemptMode = REQUEUE PreemptType = preempt/qos PriorityParameters = (null) PriorityDecayHalfLife = 7-00:00:00 PriorityCalcPeriod = 00:05:00 PriorityFavorSmall = No PriorityFlags = PriorityMaxAge = 128-00:00:00 PriorityUsageResetPeriod = NONE PriorityType = priority/multifactor PriorityWeightAge = 184320 PriorityWeightFairShare = 1440 PriorityWeightJobSize = 0 PriorityWeightPartition = 0 PriorityWeightQOS = 253440 PriorityWeightTRES = (null) PrivateData = none ProctrackType = proctrack/cray Prolog = (null) PrologEpilogTimeout = 65534 PrologSlurmctld = (null) PrologFlags = Alloc,Contain PropagatePrioProcess = 0 PropagateResourceLimits = ALL PropagateResourceLimitsExcept = (null) RebootProgram = (null) ReconfigFlags = (null) RequeueExit = (null) RequeueExitHold = (null) ResumeProgram = /usr/sbin/capmc_resume ResumeRate = 300 nodes/min ResumeTimeout = 1800 sec ResvEpilog = (null) ResvOverRun = 0 min ResvProlog = (null) ReturnToService = 1 RoutePlugin = route/default SallocDefaultCommand = srun -n1 -N1 --mem-per-cpu=0 --pty --preserve-env --gres=craynetwork:0 --mpi=none --cpu_bind=none $SHELL SchedulerParameters = no_backup_scheduling,bf_window=5760,bf_resolution=120,bf_max_job_array_resv=20,default_queue_depth=400,bf_max_job_test=1000000,bf_continue,nohold_on_prolog_fail,kill_invalid_depend,sched_min_interval=2,bf_interval=120,bf_min_age_reserve=600,bf_max_job_user=30,bf_min_prio_reserve=69120 SchedulerPort = 7321 SchedulerRootFilter = 1 SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill SelectType = select/cray SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK SlurmUser = root(0) SlurmctldDebug = debug SlurmctldLogFile = /var/tmp/slurm/slurmctld.log SlurmctldPort = 6817 SlurmctldTimeout = 120 sec SlurmdDebug = info SlurmdLogFile = /var/spool/slurmd/%h.log SlurmdPidFile = /var/run/slurmd.pid SlurmdPlugstack = (null) SlurmdPort = 6818 SlurmdSpoolDir = /var/spool/slurmd SlurmdTimeout = 300 sec SlurmdUser = root(0) SlurmSchedLogFile = (null) SlurmSchedLogLevel = 0 SlurmctldPidFile = /var/run/slurmctld.pid SlurmctldPlugstack = (null) SLURM_CONF = /etc/slurm/slurm.conf SLURM_VERSION = 16.05.3 SrunEpilog = (null) SrunPortRange = 60001-63000 SrunProlog = (null) StateSaveLocation = /global/syscom/gerty/sc/nsg/var/gerty-slurm-state SuspendExcNodes = (null) SuspendExcParts = (null) SuspendProgram = /usr/sbin/capmc_suspend SuspendRate = 60 nodes/min SuspendTime = 30000000 sec SuspendTimeout = 30 sec SwitchType = switch/cray TaskEpilog = (null) TaskPlugin = task/affinity,task/cgroup,task/cray TaskPluginParam = (null type) TaskProlog = (null) TCPTimeout = 2 sec TmpFS = /tmp TopologyParam = NoInAddrAny TopologyPlugin = topology/none TrackWCKey = No TreeWidth = 50 UsePam = 0 UnkillableStepProgram = (null) UnkillableStepTimeout = 60 sec VSizeFactor = 0 percent WaitTime = 0 sec Slurmctld(primary/backup) at ctlnet1/gertque01 are UP/DOWN
Dear Moe, I noticed from the srun man page (quoted below) that, the --ntasks-per-core is valid only for the job allocation (not applies to job step allocation), so I tried the #SBATCH --ntasks-per-core=2 and that seems to allow me to use the hyperthreads (see the output attached below)! I still need to do further testings to check if other use cases work as expected, though. I have two questions regarding this 1) It is preferred that we request a number of nodes by using #SBATCH -N <# of nodes> and then use a srun command line option to indicate if we want to use hyperthreads or not, and if yes, how many logical cores (CPUs) to use per physical core. This means I prefer to have a srun command line option, something like --ncpus-per-core, to indicate how many CPUs to use per core. Could you please take a look at this option? 2) if we have to use --ntasks-per-core=2 as a SBATCH flag, then there is a way that we can set it to default for all jobs, so that users do not have to bother to set that in each of their job script? I tested that #SBATCH --ntasks-per-core=2 and the Slurm configuration, CR_ONE_TASK_PER_CORE, work together fine in my limited tests. 3) When I read the srun/sbatch man pages, regarding the --ntasks-per-core option I saw the following note, " NOTE: This option is not supported unless SelectTypeParameters=CR_Core or SelectTypeParame- ters=CR_Core_Memory is configured. This option applies to job allocations." but we do not have the CR_Core, or CR_Core_Memory memory configured. Could you please let me know if this has been changed? What we have is SelectTypeParameters=CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE as you suggested, and the option --ntasks-per-core seems to work at job allocation time. Thanks, Zhengji −−ntasks−per−core=<ntasks> Request the maximum ntasks be invoked on each core. This option applies to the job allocation, but not to step allocations. Meant to be used with the −−ntasks option. Related to −−ntasks−per−node except at the core level instead of the node level. Masks will automatically be generated to bind the tasks to specific core unless −−cpu_bind=none is specified. NOTE: This option is not supported unless SelectTypeParameters=CR_Core or SelectTypeParame- ters=CR_Core_Memory is configured. This option applies to job allocations. export OMP_NUM_THREADS=1 srun --cpu_bind=verbose -n64 ./xthi.intel 2>&1 |sort -nk4,6 Hello from rank 0 thread 0 on nid00021 (core affinity = 0) cpu_bind=MASK - nid00021, task 0 0 [23010]: mask 0x1 set cpu_bind=MASK - nid00021, task 1 1 [23011]: mask 0x2 set cpu_bind=MASK - nid00021, task 2 2 [23012]: mask 0x4 set cpu_bind=MASK - nid00021, task 3 3 [23013]: mask 0x8 set cpu_bind=MASK - nid00021, task 4 4 [23014]: mask 0x10 set cpu_bind=MASK - nid00021, task 5 5 [23015]: mask 0x20 set cpu_bind=MASK - nid00021, task 6 6 [23016]: mask 0x40 set cpu_bind=MASK - nid00021, task 7 7 [23017]: mask 0x80 set cpu_bind=MASK - nid00021, task 8 8 [23018]: mask 0x100 set cpu_bind=MASK - nid00021, task 9 9 [23019]: mask 0x200 set cpu_bind=MASK - nid00021, task 10 10 [23020]: mask 0x400 set cpu_bind=MASK - nid00021, task 11 11 [23021]: mask 0x800 set cpu_bind=MASK - nid00021, task 12 12 [23022]: mask 0x1000 set cpu_bind=MASK - nid00021, task 13 13 [23023]: mask 0x2000 set cpu_bind=MASK - nid00021, task 14 14 [23024]: mask 0x4000 set cpu_bind=MASK - nid00021, task 15 15 [23025]: mask 0x8000 set cpu_bind=MASK - nid00021, task 16 16 [23026]: mask 0x10000 set cpu_bind=MASK - nid00021, task 17 17 [23027]: mask 0x20000 set cpu_bind=MASK - nid00021, task 18 18 [23028]: mask 0x40000 set cpu_bind=MASK - nid00021, task 19 19 [23029]: mask 0x80000 set cpu_bind=MASK - nid00021, task 20 20 [23030]: mask 0x100000 set cpu_bind=MASK - nid00021, task 21 21 [23031]: mask 0x200000 set cpu_bind=MASK - nid00021, task 22 22 [23032]: mask 0x400000 set cpu_bind=MASK - nid00021, task 23 23 [23033]: mask 0x800000 set cpu_bind=MASK - nid00021, task 24 24 [23034]: mask 0x1000000 set cpu_bind=MASK - nid00021, task 25 25 [23035]: mask 0x2000000 set cpu_bind=MASK - nid00021, task 26 26 [23036]: mask 0x4000000 set cpu_bind=MASK - nid00021, task 27 27 [23037]: mask 0x8000000 set cpu_bind=MASK - nid00021, task 28 28 [23038]: mask 0x10000000 set cpu_bind=MASK - nid00021, task 29 29 [23039]: mask 0x20000000 set cpu_bind=MASK - nid00021, task 30 30 [23040]: mask 0x40000000 set cpu_bind=MASK - nid00021, task 31 31 [23041]: mask 0x80000000 set cpu_bind=MASK - nid00021, task 32 32 [23042]: mask 0x100000000 set cpu_bind=MASK - nid00021, task 33 33 [23043]: mask 0x200000000 set cpu_bind=MASK - nid00021, task 34 34 [23044]: mask 0x400000000 set cpu_bind=MASK - nid00021, task 35 35 [23045]: mask 0x800000000 set cpu_bind=MASK - nid00021, task 36 36 [23046]: mask 0x1000000000 set cpu_bind=MASK - nid00021, task 37 37 [23047]: mask 0x2000000000 set cpu_bind=MASK - nid00021, task 38 38 [23048]: mask 0x4000000000 set cpu_bind=MASK - nid00021, task 39 39 [23049]: mask 0x8000000000 set cpu_bind=MASK - nid00021, task 40 40 [23050]: mask 0x10000000000 set cpu_bind=MASK - nid00021, task 41 41 [23051]: mask 0x20000000000 set cpu_bind=MASK - nid00021, task 42 42 [23052]: mask 0x40000000000 set cpu_bind=MASK - nid00021, task 43 43 [23053]: mask 0x80000000000 set cpu_bind=MASK - nid00021, task 44 44 [23054]: mask 0x100000000000 set cpu_bind=MASK - nid00021, task 45 45 [23055]: mask 0x200000000000 set cpu_bind=MASK - nid00021, task 46 46 [23056]: mask 0x400000000000 set cpu_bind=MASK - nid00021, task 47 47 [23057]: mask 0x800000000000 set cpu_bind=MASK - nid00021, task 48 48 [23058]: mask 0x1000000000000 set cpu_bind=MASK - nid00021, task 49 49 [23059]: mask 0x2000000000000 set cpu_bind=MASK - nid00021, task 50 50 [23060]: mask 0x4000000000000 set cpu_bind=MASK - nid00021, task 51 51 [23061]: mask 0x8000000000000 set cpu_bind=MASK - nid00021, task 52 52 [23062]: mask 0x10000000000000 set cpu_bind=MASK - nid00021, task 53 53 [23063]: mask 0x20000000000000 set cpu_bind=MASK - nid00021, task 54 54 [23064]: mask 0x40000000000000 set cpu_bind=MASK - nid00021, task 55 55 [23065]: mask 0x80000000000000 set cpu_bind=MASK - nid00021, task 56 56 [23066]: mask 0x100000000000000 set cpu_bind=MASK - nid00021, task 57 57 [23067]: mask 0x200000000000000 set cpu_bind=MASK - nid00021, task 58 58 [23068]: mask 0x400000000000000 set cpu_bind=MASK - nid00021, task 59 59 [23069]: mask 0x800000000000000 set cpu_bind=MASK - nid00021, task 60 60 [23070]: mask 0x1000000000000000 set cpu_bind=MASK - nid00021, task 61 61 [23071]: mask 0x2000000000000000 set cpu_bind=MASK - nid00021, task 62 62 [23072]: mask 0x4000000000000000 set cpu_bind=MASK - nid00021, task 63 63 [23073]: mask 0x8000000000000000 set Hello from rank 1 thread 0 on nid00021 (core affinity = 1) Hello from rank 2 thread 0 on nid00021 (core affinity = 2) Hello from rank 3 thread 0 on nid00021 (core affinity = 3) Hello from rank 4 thread 0 on nid00021 (core affinity = 4) Hello from rank 5 thread 0 on nid00021 (core affinity = 5) Hello from rank 6 thread 0 on nid00021 (core affinity = 6) Hello from rank 7 thread 0 on nid00021 (core affinity = 7) Hello from rank 8 thread 0 on nid00021 (core affinity = 8) Hello from rank 9 thread 0 on nid00021 (core affinity = 9) Hello from rank 10 thread 0 on nid00021 (core affinity = 10) Hello from rank 11 thread 0 on nid00021 (core affinity = 11) Hello from rank 12 thread 0 on nid00021 (core affinity = 12) Hello from rank 13 thread 0 on nid00021 (core affinity = 13) Hello from rank 14 thread 0 on nid00021 (core affinity = 14) Hello from rank 15 thread 0 on nid00021 (core affinity = 15) Hello from rank 16 thread 0 on nid00021 (core affinity = 16) Hello from rank 17 thread 0 on nid00021 (core affinity = 17) Hello from rank 18 thread 0 on nid00021 (core affinity = 18) Hello from rank 19 thread 0 on nid00021 (core affinity = 19) Hello from rank 20 thread 0 on nid00021 (core affinity = 20) Hello from rank 21 thread 0 on nid00021 (core affinity = 21) Hello from rank 22 thread 0 on nid00021 (core affinity = 22) Hello from rank 23 thread 0 on nid00021 (core affinity = 23) Hello from rank 24 thread 0 on nid00021 (core affinity = 24) Hello from rank 25 thread 0 on nid00021 (core affinity = 25) Hello from rank 26 thread 0 on nid00021 (core affinity = 26) Hello from rank 27 thread 0 on nid00021 (core affinity = 27) Hello from rank 28 thread 0 on nid00021 (core affinity = 28) Hello from rank 29 thread 0 on nid00021 (core affinity = 29) Hello from rank 30 thread 0 on nid00021 (core affinity = 30) Hello from rank 31 thread 0 on nid00021 (core affinity = 31) Hello from rank 32 thread 0 on nid00021 (core affinity = 32) Hello from rank 33 thread 0 on nid00021 (core affinity = 33) Hello from rank 34 thread 0 on nid00021 (core affinity = 34) Hello from rank 35 thread 0 on nid00021 (core affinity = 35) Hello from rank 36 thread 0 on nid00021 (core affinity = 36) Hello from rank 37 thread 0 on nid00021 (core affinity = 37) Hello from rank 38 thread 0 on nid00021 (core affinity = 38) Hello from rank 39 thread 0 on nid00021 (core affinity = 39) Hello from rank 40 thread 0 on nid00021 (core affinity = 40) Hello from rank 41 thread 0 on nid00021 (core affinity = 41) Hello from rank 42 thread 0 on nid00021 (core affinity = 42) Hello from rank 43 thread 0 on nid00021 (core affinity = 43) Hello from rank 44 thread 0 on nid00021 (core affinity = 44) Hello from rank 45 thread 0 on nid00021 (core affinity = 45) Hello from rank 46 thread 0 on nid00021 (core affinity = 46) Hello from rank 47 thread 0 on nid00021 (core affinity = 47) Hello from rank 48 thread 0 on nid00021 (core affinity = 48) Hello from rank 49 thread 0 on nid00021 (core affinity = 49) Hello from rank 50 thread 0 on nid00021 (core affinity = 50) Hello from rank 51 thread 0 on nid00021 (core affinity = 51) Hello from rank 52 thread 0 on nid00021 (core affinity = 52) Hello from rank 53 thread 0 on nid00021 (core affinity = 53) Hello from rank 54 thread 0 on nid00021 (core affinity = 54) Hello from rank 55 thread 0 on nid00021 (core affinity = 55) Hello from rank 56 thread 0 on nid00021 (core affinity = 56) Hello from rank 57 thread 0 on nid00021 (core affinity = 57) Hello from rank 58 thread 0 on nid00021 (core affinity = 58) Hello from rank 59 thread 0 on nid00021 (core affinity = 59) Hello from rank 60 thread 0 on nid00021 (core affinity = 60) Hello from rank 61 thread 0 on nid00021 (core affinity = 61) Hello from rank 62 thread 0 on nid00021 (core affinity = 62) Hello from rank 63 thread 0 on nid00021 (core affinity = 63) export OMP_NUM_THREADS=8 srun --cpu_bind=verbose -n8 -c8 ./xthi.intel 2>&1 |sort -nk4,6 Hello from rank 0 thread 0 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 1 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 2 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 3 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 4 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 5 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 6 on nid00021 (core affinity = 0-7) Hello from rank 0 thread 7 on nid00021 (core affinity = 0-7) cpu_bind=MASK - nid00021, task 0 0 [25632]: mask 0xff set cpu_bind=MASK - nid00021, task 1 1 [25633]: mask 0xff00 set cpu_bind=MASK - nid00021, task 2 2 [25634]: mask 0xff0000 set cpu_bind=MASK - nid00021, task 3 3 [25635]: mask 0xff000000 set cpu_bind=MASK - nid00021, task 4 4 [25636]: mask 0xff00000000 set cpu_bind=MASK - nid00021, task 5 5 [25637]: mask 0xff0000000000 set cpu_bind=MASK - nid00021, task 6 6 [25638]: mask 0xff000000000000 set cpu_bind=MASK - nid00021, task 7 7 [25639]: mask 0xff00000000000000 set Hello from rank 1 thread 0 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 1 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 2 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 3 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 4 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 5 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 6 on nid00021 (core affinity = 8-15) Hello from rank 1 thread 7 on nid00021 (core affinity = 8-15) Hello from rank 2 thread 0 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 1 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 2 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 3 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 4 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 5 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 6 on nid00021 (core affinity = 16-23) Hello from rank 2 thread 7 on nid00021 (core affinity = 16-23) Hello from rank 3 thread 0 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 1 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 2 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 3 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 4 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 5 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 6 on nid00021 (core affinity = 24-31) Hello from rank 3 thread 7 on nid00021 (core affinity = 24-31) Hello from rank 4 thread 0 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 1 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 2 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 3 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 4 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 5 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 6 on nid00021 (core affinity = 32-39) Hello from rank 4 thread 7 on nid00021 (core affinity = 32-39) Hello from rank 5 thread 0 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 1 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 2 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 3 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 4 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 5 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 6 on nid00021 (core affinity = 40-47) Hello from rank 5 thread 7 on nid00021 (core affinity = 40-47) Hello from rank 6 thread 0 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 1 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 2 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 3 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 4 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 5 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 6 on nid00021 (core affinity = 48-55) Hello from rank 6 thread 7 on nid00021 (core affinity = 48-55) Hello from rank 7 thread 0 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 1 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 2 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 3 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 4 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 5 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 6 on nid00021 (core affinity = 56-63) Hello from rank 7 thread 7 on nid00021 (core affinity = 56-63)
(In reply to Zhengji Zhao from comment #34) > 2) It seems now I can not run with hyperthreads anymore, the following srun > commands all return error, "More processors requested than permitted", > > srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel > srun: error: Unable to create job step: More processors requested than > permitted > > srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel > srun: error: Unable to create job step: More processors requested than > permitted > > srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 --hint=multithread > ./xthi.intel > srun: error: Unable to create job step: More processors requested than > permitted > Dear Zhengji, I am not able to reproduce this second problem. An execute line of this sort: > srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel binds one task to each thread for me. Are you running this srun command within an existing job allocation (under an salloc or sbatch shell)? If so, could you send me the salloc/sbatch execute line which you use? I would guess that you are using an salloc/sbatch command which is setting some environment variables. The srun command would merge that environment with the options on your execute line to possibly generate a step request which can not be satisfied.
I used the following job script to get that "More processors requested than permitted" error in my comment 34. zz217@gert01:~/affinity/hsw> cat run.slurm #!/bin/bash -l #SBATCH -N 1 #SBATCH -p debug set -x #srun --cpu_bind=verbose --mem_bind=verbose,local -n32 ./xthi.intel 2>&1 |sort -nk4,6 srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel 2>&1 |sort -nk4,6 srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel 2>&1 |sort -nk4,6 srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 --hint=multithread ./xthi.intel 2>&1 |sort -nk4,6 zz217@gert01:~/affinity/hsw> sbatch run.slurm Submitted batch job 210 zz217@gert01:~/affinity/hsw> cat slurm-210.out + srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel + sort -nk4,6 srun: error: Unable to create job step: More processors requested than permitted + sort -nk4,6 + srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel srun: error: Unable to create job step: More processors requested than permitted + srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 --hint=multithread ./xthi.intel + sort -nk4,6 srun: error: Unable to create job step: More processors requested than permitted Please see my comment 37 (for more questions I have), where I tried #SBATCH --ntasks-per-node=2 and it seems to allow me to use hyperthreads. Thanks, Zhengji
The "-N1" option allocates the job resources on at least one node, but the minimum allocation size (based upon "CR_Socket_Memory" in slurm.conf) is one socket. Your job allocation only includes one socket, which is 16 cores or 32 threads. Your srun commands to launch a job step are trying to use 64 threads. What you presumably want to do is modify your job allocation so that it gets both sockets (64 threads), so that your job step can run 64 tasks (one per thread). I would suggest that you do this with a line like this in your script: #SBATCH -N 1 -n64 --ntasks-per-core=1 I'll respond to your other questions in a separate comment. (In reply to Zhengji Zhao from comment #39) > I used the following job script to get that "More processors requested than > permitted" error in my comment 34. > > > zz217@gert01:~/affinity/hsw> cat run.slurm > #!/bin/bash -l > > #SBATCH -N 1 > #SBATCH -p debug > > set -x > > #srun --cpu_bind=verbose --mem_bind=verbose,local -n32 ./xthi.intel 2>&1 > |sort -nk4,6 > > srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel 2>&1 |sort > -nk4,6 > > srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel 2>&1 > |sort -nk4,6 > > srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 --hint=multithread > ./xthi.intel 2>&1 |sort -nk4,6 > > zz217@gert01:~/affinity/hsw> sbatch run.slurm > Submitted batch job 210 > zz217@gert01:~/affinity/hsw> cat slurm-210.out > + srun --cpu_bind=verbose -n64 --ntasks-per-core=2 ./xthi.intel > + sort -nk4,6 > srun: error: Unable to create job step: More processors requested than > permitted > + sort -nk4,6 > + srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 ./xthi.intel > srun: error: Unable to create job step: More processors requested than > permitted > + srun --cpu_bind=verbose,threads -n64 --ntasks-per-core=2 > --hint=multithread ./xthi.intel > + sort -nk4,6 > srun: error: Unable to create job step: More processors requested than > permitted > > > Please see my comment 37 (for more questions I have), where I tried #SBATCH > --ntasks-per-node=2 and it seems to allow me to use hyperthreads. > > > Thanks, > Zhengji
Dear Zhengji, My responses are in-line below. (In reply to Zhengji Zhao from comment #37) > Dear Moe, > > I noticed from the srun man page (quoted below) that, the --ntasks-per-core > is valid only for the job allocation (not applies to job step allocation), > so I tried the #SBATCH --ntasks-per-core=2 and that seems to allow me to use > the hyperthreads (see the output attached below)! I still need to do further > testings to check if other use cases work as expected, though. > > I have two questions regarding this > > 1) It is preferred that we request a number of nodes by using #SBATCH -N <# > of nodes> and then use a srun command line option to indicate if we want to > use hyperthreads or not, and if yes, how many logical cores (CPUs) to use > per physical core. This means I prefer to have a srun command line option, > something like --ncpus-per-core, to indicate how many CPUs to use per core. > Could you please take a look at this option? Your system is currently configured to allocate resources to jobs at the socket level rather than node level (Slurm can allocate at the level of nodes, sockets, cores or threads, depending upon it's configuration). The advantage of this is that more than one job can run at a time on a compute node, which works very well for smaller jobs. The "-N#" option only tells Slurm to allocate the job resources on the specified node count. If you want to insure individual jobs are allocated more than a single socket on the node, say the entire node, the job request should specify this using something like a task count plus --ntasks-per-core. Note that setting options in sbatch will result in environment variables being set for the job step creation. For example "sbatch -n64 ..." eliminates the need for the "-n64" option in srun. > 2) if we have to use --ntasks-per-core=2 as a SBATCH flag, then there is a > way that we can set it to default for all jobs, so that users do not have to > bother to set that in each of their job script? I tested that #SBATCH > --ntasks-per-core=2 and the Slurm configuration, CR_ONE_TASK_PER_CORE, work > together fine in my limited tests. Perhaps you want to eliminate the "CR_ONE_TASK_PER_CORE" option in slurm.conf then and require users who want to run one task per core to explicitly specify "--ntasks-per-core=1". Alternately global environment variables or a job_submit plugin can be used to set various default options for jobs. See: http://slurm.schedmd.com/job_submit_plugins.html > 3) When I read the srun/sbatch man pages, regarding the --ntasks-per-core > option I saw the following note, > > " NOTE: This option is not supported unless SelectTypeParameters=CR_Core or > SelectTypeParame- ters=CR_Core_Memory is configured. This option applies to > job allocations." > > but we do not have the CR_Core, or CR_Core_Memory memory configured. Could > you please let me know if this has been changed? What we have is That documentation is no longer correct. I will update it shortly. > SelectTypeParameters=CR_SOCKET_MEMORY,OTHER_CONS_RES, > CR_CORE_DEFAULT_DIST_BLOCK,CR_ONE_TASK_PER_CORE > > as you suggested, and the option --ntasks-per-core seems to work at job > allocation time. > > Thanks, > Zhengji
Dear Moe, Thanks for getting back to me. Regarding how many sockets (one socket or both sockets) the #SBATCH -N 1 allocates with the CR_Socket_Memory configured, I still have a question. We have several partitions configured on our systems, and not all partitions are configured to share the nodes between jobs. For example, for the debug partition that I used, we have the following configuration, zz217@gert01:~/affinity/hsw> scontrol show partition debug PartitionName=debug AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=YES QoS=N/A DefaultTime=00:10:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=nid000[21-23,28-30,52-54,56-63] PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO OverSubscribe=EXCLUSIVE PreemptMode=REQUEUE State=UP TotalCPUs=1088 TotalNodes=17 SelectTypeParameters=NONE DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED Please note we have the OverSubscribe=EXCLUSIVE set for this partition. I wonder if this should be sufficient to avoid the node being shared with other jobs, in which case, then I think the node should be fully (both sockets) allocated to the job. In our production systems (both Cray XC30 called Edison, and XC40 called Cori phase I, we have been observing this behavior, i.e., if I use #SBATCH -N 1, we are getting full node (both sockets), and we did not have to use other SBATCH flags to help allocate the full node to the job. On Edison and Cori we have the following configuration zz217@cori06:~> scontrol show config |grep -E "TaskPlugin|SelectTypeParameters" SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES TaskPlugin = task/cgroup,task/cray TaskPluginParam = (null type) while now on the test system (Gerty), with the following configuration with the same debug partition configuration (OverSubscribe=EXCLUSIVE) I am unable to get the full node with #SBATCH -N 1 alone. zz217@gert01:~> scontrol show config |grep -E "TaskPlugin|SelectTypeParameters" SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK TaskPlugin = task/affinity,task/cgroup,task/cray TaskPluginParam = (null type) zz217@gert01:~> So I think perhaps there is still some room to improve in Slurm, so that even with CR_Socket_memory is configured, if the partition does not allow the node sharing, then the full node can be allocated to a job with #SBATCH -N 1 alone. I understand this may or may not be a bug, and we can get around this with either #SBATCH -N 1 --ntasks-per-core=2 with the CR_ONE_TASK_PER_CORE being set or with #SBATCH -N 1 --ntasks=64 --ntasks-per-core=1 as you suggested. So this issue is not a pressing issue for me now. However, it would be great if someday this could be fixed. I will decide if we will settle with the #SBATCH -N 1 --ntasks-per-core=2 to use hyperthreading when the default is CR_ONE_TASK_PER_CORE. Please let me know if you consider this is something needs to be fixed or not. Regarding your following comment, "Perhaps you want to eliminate the "CR_ONE_TASK_PER_CORE" option in slurm.conf then and require users who want to run one task per core to explicitly specify "--ntasks-per-core=1". Alternately global environment variables or a job_submit plugin can be used to set various default options for jobs. See: http://slurm.schedmd.com/job_submit_plugins.html" I would like to let you know that we wanted to do the opposite. We wanted the default to be not using hyperthreading, and whoever wants to use hyperthreading indicate that explicitly with an extra flag (e.g., --ntasks-per-core=2). The reason we wanted to have the default not to bother with hyperthreading was that most of our workloads do not get benefits from using hyperthreading on Edison and Cori Phase I. However, the situation may change on KNL (Cori Phase II), so it is possible that in the future we want to hyperthreading to be the default on Cori Phase II, and require users who do not want to use hyperthreading to explicitly specify "--ntasks-per-core=1". I will do further testing with the SLURM 16.05.4, and will let you know if it meets our task/memory/thread affinity need. I will open another bug shortly for the affinity issues on KNL as you suggested in this bug. Thanks very much for your timely help. Zhengji
Dear Moe, Is Slurm 16.05.4 available now? Could you please let me know where I can find this version? I see this version has already been listed in the bugs.schedmd.com site, but I did not see from your download site. Thanks, Zhengji
The release of version 16.05.4 was delayed until this morning due to a family emergency. You can download Slurm from here: http://www.schedmd.com/#repos
(In reply to Zhengji Zhao from comment #42) > Dear Moe, > > Thanks for getting back to me. > > Regarding how many sockets (one socket or both sockets) the #SBATCH -N 1 > allocates with the CR_Socket_Memory configured, I still have a question. > > We have several partitions configured on our systems, and not all partitions > are configured to share the nodes between jobs. For example, for the debug > partition that I used, we have the following configuration, > > zz217@gert01:~/affinity/hsw> scontrol show partition debug > PartitionName=debug > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > AllocNodes=ALL Default=YES QoS=N/A > DefaultTime=00:10:00 DisableRootJobs=YES ExclusiveUser=NO GraceTime=0 > Hidden=NO > MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO > MaxCPUsPerNode=UNLIMITED > Nodes=nid000[21-23,28-30,52-54,56-63] > PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO > OverSubscribe=EXCLUSIVE PreemptMode=REQUEUE > State=UP TotalCPUs=1088 TotalNodes=17 SelectTypeParameters=NONE > DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED > > Please note we have the OverSubscribe=EXCLUSIVE set for this partition. I > wonder if this should be sufficient to avoid the node being shared with > other jobs, in which case, then I think the node should be fully (both > sockets) allocated to the job. I overlooked the "OverSubscribe=EXCLUSIVE" in the partition specification. That is sufficient to allocate your job all cores on all sockets of the allocated node with the "-N1" option. > In our production systems (both Cray XC30 > called Edison, and XC40 called Cori phase I, we have been observing this > behavior, i.e., if I use #SBATCH -N 1, we are getting full node (both > sockets), and we did not have to use other SBATCH flags to help allocate the > full node to the job. On Edison and Cori we have the following configuration > > zz217@cori06:~> scontrol show config |grep -E > "TaskPlugin|SelectTypeParameters" > SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES > TaskPlugin = task/cgroup,task/cray > TaskPluginParam = (null type) > > > while now on the test system (Gerty), with the following configuration with > the same debug partition configuration (OverSubscribe=EXCLUSIVE) I am unable > to get the full node with #SBATCH -N 1 alone. I believe that you were being allocated the full node, but the bug binding tasks to the wrong CPUs was making it look like that was not the case. > zz217@gert01:~> scontrol show config |grep -E > "TaskPlugin|SelectTypeParameters" > SelectTypeParameters = > CR_SOCKET_MEMORY,OTHER_CONS_RES,CR_ONE_TASK_PER_CORE, > CR_CORE_DEFAULT_DIST_BLOCK > TaskPlugin = task/affinity,task/cgroup,task/cray > TaskPluginParam = (null type) > zz217@gert01:~> > > > So I think perhaps there is still some room to improve in Slurm, so that > even with CR_Socket_memory is configured, if the partition does not allow > the node sharing, then the full node can be allocated to a job with #SBATCH > -N 1 alone. I understand this may or may not be a bug, and we can get around > this with either #SBATCH -N 1 --ntasks-per-core=2 with the > CR_ONE_TASK_PER_CORE being set or with #SBATCH -N 1 --ntasks=64 > --ntasks-per-core=1 as you suggested. So this issue is not a pressing issue > for me now. However, it would be great if someday this could be fixed. I > will decide if we will settle with the #SBATCH -N 1 --ntasks-per-core=2 to > use hyperthreading when the default is CR_ONE_TASK_PER_CORE. Please let me > know if you consider this is something needs to be fixed or not. > > > Regarding your following comment, > > "Perhaps you want to eliminate the "CR_ONE_TASK_PER_CORE" option in > slurm.conf then and require users who want to run one task per core to > explicitly specify "--ntasks-per-core=1". > > Alternately global environment variables or a job_submit plugin can be used > to set various default options for jobs. See: > http://slurm.schedmd.com/job_submit_plugins.html" > > > I would like to let you know that we wanted to do the opposite. We wanted > the default to be not using hyperthreading, and whoever wants to use > hyperthreading indicate that explicitly with an extra flag (e.g., > --ntasks-per-core=2). The reason we wanted to have the default not to bother > with hyperthreading was that most of our workloads do not get benefits from > using hyperthreading on Edison and Cori Phase I. However, the situation may > change on KNL (Cori Phase II), so it is possible that in the future we want > to hyperthreading to be the default on Cori Phase II, and require users who > do not want to use hyperthreading to explicitly specify > "--ntasks-per-core=1". > > > I will do further testing with the SLURM 16.05.4, and will let you know if > it meets our task/memory/thread affinity need. > > I will open another bug shortly for the affinity issues on KNL as you > suggested in this bug. > > Thanks very much for your timely help. > > Zhengji
Please open a new ticket or re-open this one if necessary.
Dear Moe, I would like to confirm with you that if the last bug was fixed or not. In Comment 45, you said that I overlooked the "OverSubscribe=EXCLUSIVE" in the partition specification. That is sufficient to allocate your job all cores on all sockets of the allocated node with the "-N1" option. Could you please let us know? Currently we are running 16.05.5, we still can not get all the CPUs on the node (full node) with #SBATCH -N 1 alone with the current Select parameters, swowner@cori08:~> scontrol show config |grep -i select SelectType = select/cray SelectTypeParameters = CR_SOCKET_MEMORY,OTHER_CONS_RES,NHC_NO,CR_ONE_TASK_PER_CORE,CR_CORE_DEFAULT_DIST_BLOCK Thanks, Zhengji
Sorry I should have added that we still can not get the all cpus on the nodes with #SBATCH -N 1 under a partition configured OverSubscribe=EXCLUSIVE. Thanks, Zhengji
There is information about quite a few different hardware and software configurations in this ticket. There were definitely some task binding issues in Slurm version 16.05.4 with some KNL NUMA modes. For example SNC2/flat in SLurm version 16.05.4 would produce the following task binding: $ sbatch -N1 tmp cpu_bind=MASK - knl, task 0 0 [91243]: mask 0xffffffffffffffffffffffffffffffffff000000003ffffffff000000003ffffffff set That is corrected in version 16.05.5: cpu_bind=MASK - knl, task 0 0 [49658]: mask 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff set Can you provide more details about exactly what you are seeing in this ticket or open a new ticket, which would probably be less confusing. More complete logs below: $ cat tmp #!/bin/bash ./srun --cpu_bind=v sleep 100 exit 0 $ sbatch -N1 tmp Submitted batch job 38 $ cat sl*out cpu_bind=MASK - knl, task 0 0 [49658]: mask 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff set SlurmctldLogFile (with "DebugFlags=CPU_Bind" configured): slurmctld: _slurm_rpc_submit_batch_job JobId=38 usec=1301 slurmctld: ==================== slurmctld: job_id:38 nhosts:1 ncpus:1 node_req:64000 nodes=knl0 slurmctld: Node[0]: slurmctld: Mem(MB):6800:0 Sockets:2 Cores:34 CPUs:68:0 slurmctld: Socket[0] Core[0] is allocated slurmctld: Socket[0] Core[1] is allocated slurmctld: Socket[0] Core[2] is allocated slurmctld: Socket[0] Core[3] is allocated slurmctld: Socket[0] Core[4] is allocated slurmctld: Socket[0] Core[5] is allocated slurmctld: Socket[0] Core[6] is allocated slurmctld: Socket[0] Core[7] is allocated slurmctld: Socket[0] Core[8] is allocated slurmctld: Socket[0] Core[9] is allocated slurmctld: Socket[0] Core[10] is allocated slurmctld: Socket[0] Core[11] is allocated slurmctld: Socket[0] Core[12] is allocated slurmctld: Socket[0] Core[13] is allocated slurmctld: Socket[0] Core[14] is allocated slurmctld: Socket[0] Core[15] is allocated slurmctld: Socket[0] Core[16] is allocated slurmctld: Socket[0] Core[17] is allocated slurmctld: Socket[0] Core[18] is allocated slurmctld: Socket[0] Core[19] is allocated slurmctld: Socket[0] Core[20] is allocated slurmctld: Socket[0] Core[21] is allocated slurmctld: Socket[0] Core[22] is allocated slurmctld: Socket[0] Core[23] is allocated slurmctld: Socket[0] Core[24] is allocated slurmctld: Socket[0] Core[25] is allocated slurmctld: Socket[0] Core[26] is allocated slurmctld: Socket[0] Core[27] is allocated slurmctld: Socket[0] Core[28] is allocated slurmctld: Socket[0] Core[29] is allocated slurmctld: Socket[0] Core[30] is allocated slurmctld: Socket[0] Core[31] is allocated slurmctld: Socket[0] Core[32] is allocated slurmctld: Socket[0] Core[33] is allocated slurmctld: Socket[1] Core[0] is allocated slurmctld: Socket[1] Core[1] is allocated slurmctld: Socket[1] Core[2] is allocated slurmctld: Socket[1] Core[3] is allocated slurmctld: Socket[1] Core[4] is allocated slurmctld: Socket[1] Core[5] is allocated slurmctld: Socket[1] Core[6] is allocated slurmctld: Socket[1] Core[7] is allocated slurmctld: Socket[1] Core[8] is allocated slurmctld: Socket[1] Core[9] is allocated slurmctld: Socket[1] Core[10] is allocated slurmctld: Socket[1] Core[11] is allocated slurmctld: Socket[1] Core[12] is allocated slurmctld: Socket[1] Core[13] is allocated slurmctld: Socket[1] Core[14] is allocated slurmctld: Socket[1] Core[15] is allocated slurmctld: Socket[1] Core[16] is allocated slurmctld: Socket[1] Core[17] is allocated slurmctld: Socket[1] Core[18] is allocated slurmctld: Socket[1] Core[19] is allocated slurmctld: Socket[1] Core[20] is allocated slurmctld: Socket[1] Core[21] is allocated slurmctld: Socket[1] Core[22] is allocated slurmctld: Socket[1] Core[23] is allocated slurmctld: Socket[1] Core[24] is allocated slurmctld: Socket[1] Core[25] is allocated slurmctld: Socket[1] Core[26] is allocated slurmctld: Socket[1] Core[27] is allocated slurmctld: Socket[1] Core[28] is allocated slurmctld: Socket[1] Core[29] is allocated slurmctld: Socket[1] Core[30] is allocated slurmctld: Socket[1] Core[31] is allocated slurmctld: Socket[1] Core[32] is allocated slurmctld: Socket[1] Core[33] is allocated slurmctld: -------------------- slurmctld: cpu_array_value[0]:68 reps:1 slurmctld: ==================== slurmctld: sched: Allocate JobID=38 NodeList=knl0 #CPUs=272 Partition=debug slurmctld: _pick_step_nodes: Configuration for job 38 is complete slurmctld: ==================== slurmctld: step_id:38.0 slurmctld: JobNode[0] Socket[0] Core[0] is allocated slurmctld: JobNode[0] Socket[0] Core[1] is allocated slurmctld: JobNode[0] Socket[0] Core[2] is allocated slurmctld: JobNode[0] Socket[0] Core[3] is allocated slurmctld: JobNode[0] Socket[0] Core[4] is allocated slurmctld: JobNode[0] Socket[0] Core[5] is allocated slurmctld: JobNode[0] Socket[0] Core[6] is allocated slurmctld: JobNode[0] Socket[0] Core[7] is allocated slurmctld: JobNode[0] Socket[0] Core[8] is allocated slurmctld: JobNode[0] Socket[0] Core[9] is allocated slurmctld: JobNode[0] Socket[0] Core[10] is allocated slurmctld: JobNode[0] Socket[0] Core[11] is allocated slurmctld: JobNode[0] Socket[0] Core[12] is allocated slurmctld: JobNode[0] Socket[0] Core[13] is allocated slurmctld: JobNode[0] Socket[0] Core[14] is allocated slurmctld: JobNode[0] Socket[0] Core[15] is allocated slurmctld: JobNode[0] Socket[0] Core[16] is allocated slurmctld: JobNode[0] Socket[0] Core[17] is allocated slurmctld: JobNode[0] Socket[0] Core[18] is allocated slurmctld: JobNode[0] Socket[0] Core[19] is allocated slurmctld: JobNode[0] Socket[0] Core[20] is allocated slurmctld: JobNode[0] Socket[0] Core[21] is allocated slurmctld: JobNode[0] Socket[0] Core[22] is allocated slurmctld: JobNode[0] Socket[0] Core[23] is allocated slurmctld: JobNode[0] Socket[0] Core[24] is allocated slurmctld: JobNode[0] Socket[0] Core[25] is allocated slurmctld: JobNode[0] Socket[0] Core[26] is allocated slurmctld: JobNode[0] Socket[0] Core[27] is allocated slurmctld: JobNode[0] Socket[0] Core[28] is allocated slurmctld: JobNode[0] Socket[0] Core[29] is allocated slurmctld: JobNode[0] Socket[0] Core[30] is allocated slurmctld: JobNode[0] Socket[0] Core[31] is allocated slurmctld: JobNode[0] Socket[0] Core[32] is allocated slurmctld: JobNode[0] Socket[0] Core[33] is allocated slurmctld: JobNode[0] Socket[1] Core[0] is allocated slurmctld: JobNode[0] Socket[1] Core[1] is allocated slurmctld: JobNode[0] Socket[1] Core[2] is allocated slurmctld: JobNode[0] Socket[1] Core[3] is allocated slurmctld: JobNode[0] Socket[1] Core[4] is allocated slurmctld: JobNode[0] Socket[1] Core[5] is allocated slurmctld: JobNode[0] Socket[1] Core[6] is allocated slurmctld: JobNode[0] Socket[1] Core[7] is allocated slurmctld: JobNode[0] Socket[1] Core[8] is allocated slurmctld: JobNode[0] Socket[1] Core[9] is allocated slurmctld: JobNode[0] Socket[1] Core[10] is allocated slurmctld: JobNode[0] Socket[1] Core[11] is allocated slurmctld: JobNode[0] Socket[1] Core[12] is allocated slurmctld: JobNode[0] Socket[1] Core[13] is allocated slurmctld: JobNode[0] Socket[1] Core[14] is allocated slurmctld: JobNode[0] Socket[1] Core[15] is allocated slurmctld: JobNode[0] Socket[1] Core[16] is allocated slurmctld: JobNode[0] Socket[1] Core[17] is allocated slurmctld: JobNode[0] Socket[1] Core[18] is allocated slurmctld: JobNode[0] Socket[1] Core[19] is allocated slurmctld: JobNode[0] Socket[1] Core[20] is allocated slurmctld: JobNode[0] Socket[1] Core[21] is allocated slurmctld: JobNode[0] Socket[1] Core[22] is allocated slurmctld: JobNode[0] Socket[1] Core[23] is allocated slurmctld: JobNode[0] Socket[1] Core[24] is allocated slurmctld: JobNode[0] Socket[1] Core[25] is allocated slurmctld: JobNode[0] Socket[1] Core[26] is allocated slurmctld: JobNode[0] Socket[1] Core[27] is allocated slurmctld: JobNode[0] Socket[1] Core[28] is allocated slurmctld: JobNode[0] Socket[1] Core[29] is allocated slurmctld: JobNode[0] Socket[1] Core[30] is allocated slurmctld: JobNode[0] Socket[1] Core[31] is allocated slurmctld: JobNode[0] Socket[1] Core[32] is allocated slurmctld: JobNode[0] Socket[1] Core[33] is allocated slurmctld: ==================== slurmctld: job_complete: invalid JobId=37 SlurmdLogFile (with "DebugFlags=CPU_Bind" configured): slurmd: task_p_slurmd_batch_request: 38 slurmd: task/affinity: job 38 CPU input mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF slurmd: task/affinity: job 38 CPU final HW mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF slurmd: _run_prolog: run job script took usec=1402 slurmd: _run_prolog: prolog with lock for job 38 ran for 0 seconds slurmd: ==================== slurmd: batch_job:38 job_mem:100MB_per_CPU slurmd: JobNode[0] CPU[0] Job alloc slurmd: JobNode[0] CPU[1] Job alloc slurmd: JobNode[0] CPU[2] Job alloc slurmd: JobNode[0] CPU[3] Job alloc slurmd: JobNode[0] CPU[4] Job alloc slurmd: JobNode[0] CPU[5] Job alloc slurmd: JobNode[0] CPU[6] Job alloc slurmd: JobNode[0] CPU[7] Job alloc slurmd: JobNode[0] CPU[8] Job alloc slurmd: JobNode[0] CPU[9] Job alloc slurmd: JobNode[0] CPU[10] Job alloc slurmd: JobNode[0] CPU[11] Job alloc slurmd: JobNode[0] CPU[12] Job alloc slurmd: JobNode[0] CPU[13] Job alloc slurmd: JobNode[0] CPU[14] Job alloc slurmd: JobNode[0] CPU[15] Job alloc slurmd: JobNode[0] CPU[16] Job alloc slurmd: JobNode[0] CPU[17] Job alloc slurmd: JobNode[0] CPU[18] Job alloc slurmd: JobNode[0] CPU[19] Job alloc slurmd: JobNode[0] CPU[20] Job alloc slurmd: JobNode[0] CPU[21] Job alloc slurmd: JobNode[0] CPU[22] Job alloc slurmd: JobNode[0] CPU[23] Job alloc slurmd: JobNode[0] CPU[24] Job alloc slurmd: JobNode[0] CPU[25] Job alloc slurmd: JobNode[0] CPU[26] Job alloc slurmd: JobNode[0] CPU[27] Job alloc slurmd: JobNode[0] CPU[28] Job alloc slurmd: JobNode[0] CPU[29] Job alloc slurmd: JobNode[0] CPU[30] Job alloc slurmd: JobNode[0] CPU[31] Job alloc slurmd: JobNode[0] CPU[32] Job alloc slurmd: JobNode[0] CPU[33] Job alloc slurmd: JobNode[0] CPU[34] Job alloc slurmd: JobNode[0] CPU[35] Job alloc slurmd: JobNode[0] CPU[36] Job alloc slurmd: JobNode[0] CPU[37] Job alloc slurmd: JobNode[0] CPU[38] Job alloc slurmd: JobNode[0] CPU[39] Job alloc slurmd: JobNode[0] CPU[40] Job alloc slurmd: JobNode[0] CPU[41] Job alloc slurmd: JobNode[0] CPU[42] Job alloc slurmd: JobNode[0] CPU[43] Job alloc slurmd: JobNode[0] CPU[44] Job alloc slurmd: JobNode[0] CPU[45] Job alloc slurmd: JobNode[0] CPU[46] Job alloc slurmd: JobNode[0] CPU[47] Job alloc slurmd: JobNode[0] CPU[48] Job alloc slurmd: JobNode[0] CPU[49] Job alloc slurmd: JobNode[0] CPU[50] Job alloc slurmd: JobNode[0] CPU[51] Job alloc slurmd: JobNode[0] CPU[52] Job alloc slurmd: JobNode[0] CPU[53] Job alloc slurmd: JobNode[0] CPU[54] Job alloc slurmd: JobNode[0] CPU[55] Job alloc slurmd: JobNode[0] CPU[56] Job alloc slurmd: JobNode[0] CPU[57] Job alloc slurmd: JobNode[0] CPU[58] Job alloc slurmd: JobNode[0] CPU[59] Job alloc slurmd: JobNode[0] CPU[60] Job alloc slurmd: JobNode[0] CPU[61] Job alloc slurmd: JobNode[0] CPU[62] Job alloc slurmd: JobNode[0] CPU[63] Job alloc slurmd: JobNode[0] CPU[64] Job alloc slurmd: JobNode[0] CPU[65] Job alloc slurmd: JobNode[0] CPU[66] Job alloc slurmd: JobNode[0] CPU[67] Job alloc slurmd: ==================== slurmd: Launching batch job 38 for UID 1001 slurmd: launch task 38.0 request from 1001.1001@127.0.0.1 (port 4249) slurmd: ==================== slurmd: step_id:38.0 job_mem:100MB_per_CPU step_mem:100MB_per_CPU slurmd: JobNode[0] CPU[0] Step alloc slurmd: JobNode[0] CPU[1] Step alloc slurmd: JobNode[0] CPU[2] Step alloc slurmd: JobNode[0] CPU[3] Step alloc slurmd: JobNode[0] CPU[4] Step alloc slurmd: JobNode[0] CPU[5] Step alloc slurmd: JobNode[0] CPU[6] Step alloc slurmd: JobNode[0] CPU[7] Step alloc slurmd: JobNode[0] CPU[8] Step alloc slurmd: JobNode[0] CPU[9] Step alloc slurmd: JobNode[0] CPU[10] Step alloc slurmd: JobNode[0] CPU[11] Step alloc slurmd: JobNode[0] CPU[12] Step alloc slurmd: JobNode[0] CPU[13] Step alloc slurmd: JobNode[0] CPU[14] Step alloc slurmd: JobNode[0] CPU[15] Step alloc slurmd: JobNode[0] CPU[16] Step alloc slurmd: JobNode[0] CPU[17] Step alloc slurmd: JobNode[0] CPU[18] Step alloc slurmd: JobNode[0] CPU[19] Step alloc slurmd: JobNode[0] CPU[20] Step alloc slurmd: JobNode[0] CPU[21] Step alloc slurmd: JobNode[0] CPU[22] Step alloc slurmd: JobNode[0] CPU[23] Step alloc slurmd: JobNode[0] CPU[24] Step alloc slurmd: JobNode[0] CPU[25] Step alloc slurmd: JobNode[0] CPU[26] Step alloc slurmd: JobNode[0] CPU[27] Step alloc slurmd: JobNode[0] CPU[28] Step alloc slurmd: JobNode[0] CPU[29] Step alloc slurmd: JobNode[0] CPU[30] Step alloc slurmd: JobNode[0] CPU[31] Step alloc slurmd: JobNode[0] CPU[32] Step alloc slurmd: JobNode[0] CPU[33] Step alloc slurmd: JobNode[0] CPU[34] Step alloc slurmd: JobNode[0] CPU[35] Step alloc slurmd: JobNode[0] CPU[36] Step alloc slurmd: JobNode[0] CPU[37] Step alloc slurmd: JobNode[0] CPU[38] Step alloc slurmd: JobNode[0] CPU[39] Step alloc slurmd: JobNode[0] CPU[40] Step alloc slurmd: JobNode[0] CPU[41] Step alloc slurmd: JobNode[0] CPU[42] Step alloc slurmd: JobNode[0] CPU[43] Step alloc slurmd: JobNode[0] CPU[44] Step alloc slurmd: JobNode[0] CPU[45] Step alloc slurmd: JobNode[0] CPU[46] Step alloc slurmd: JobNode[0] CPU[47] Step alloc slurmd: JobNode[0] CPU[48] Step alloc slurmd: JobNode[0] CPU[49] Step alloc slurmd: JobNode[0] CPU[50] Step alloc slurmd: JobNode[0] CPU[51] Step alloc slurmd: JobNode[0] CPU[52] Step alloc slurmd: JobNode[0] CPU[53] Step alloc slurmd: JobNode[0] CPU[54] Step alloc slurmd: JobNode[0] CPU[55] Step alloc slurmd: JobNode[0] CPU[56] Step alloc slurmd: JobNode[0] CPU[57] Step alloc slurmd: JobNode[0] CPU[58] Step alloc slurmd: JobNode[0] CPU[59] Step alloc slurmd: JobNode[0] CPU[60] Step alloc slurmd: JobNode[0] CPU[61] Step alloc slurmd: JobNode[0] CPU[62] Step alloc slurmd: JobNode[0] CPU[63] Step alloc slurmd: JobNode[0] CPU[64] Step alloc slurmd: JobNode[0] CPU[65] Step alloc slurmd: JobNode[0] CPU[66] Step alloc slurmd: JobNode[0] CPU[67] Step alloc slurmd: ==================== slurmd: Scaling CPU count by factor of 4 (272/(68-0)) slurmd: lllp_distribution jobid [38] auto binding off: verbose,mask_cpu ^Cslurmd: got shutdown request slurmd: all threads complete slurmd: Consumable Resources (CR) Node Selection plugin shutting down ... slurmd: Munge cryptographic signature plugin unloaded
Remaining problem moved to new bug 3168.