When I run a Job in non-exclusive mode, everything is fine: to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --mem-per-cpu=25000 -t 30 -C mpi --pty bash srun: job 3439844 queued and waiting for resources srun: job 3439844 has been allocated resources to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES CfgTRES=cpu=16,mem=28000M AllocTRES=cpu=1,mem=25000M to86cola@hpa0001:~$ When I instead run the same job im exclusive mode, it gets overallocated: to86cola@hla0002:~$ /opt/slurm/current/bin/srun -n 1 --exclusive --mem-per-cpu=25000 -t 30 -C mpi --pty bash srun: job 3439868 queued and waiting for resources srun: job 3439868 has been allocated resources to86cola@hpa0001:~$ scontrol show node hpa0001|grep TRES CfgTRES=cpu=16,mem=28000M AllocTRES=cpu=16,mem=400000M to86cola@hpa0001:~$ In my opinion, the memory to allocate per cpu should be calculated by something like (requested_mem_per_cpu_on_this_node*requested_cpus_on_this_node)/allocated_cpus_on_this_node. Please fix this as it causes lots of messages in slurmctld.log, e.g.: [2017-05-30T20:38:21.839] error: cons_res: node hpa0196 memory is overallocated (32000) for job 3439883 [2017-05-30T20:38:21.841] error: cons_res: node hpa0197 memory is overallocated (32000) for job 3439884 [2017-05-30T20:38:52.642] error: cons_res: node hpa0201 memory is overallocated (32000) for job 3439885 [2017-05-30T20:38:52.643] error: cons_res: node hpa0205 memory is overallocated (32000) for job 3439886 [2017-05-30T20:38:52.645] error: cons_res: node hpa0306 memory is overallocated (32000) for job 3439887 [2017-05-30T20:38:52.646] error: cons_res: node hpa0312 memory is overallocated (32000) for job 3439888 And also in slurmdbd.log, e.g.: [2017-05-30T20:24:21.086] error: We have more allocated time than is possible (272381215680 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2 [2017-05-30T20:24:21.086] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T06:00:00 - 2017-05-21T07:00:00 tres 2 [2017-05-30T20:24:31.373] error: We have more allocated time than is possible (270273934400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2 [2017-05-30T20:24:31.373] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T07:00:00 - 2017-05-21T08:00:00 tres 2 [2017-05-30T20:24:41.025] error: We have more allocated time than is possible (270435154400 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2 [2017-05-30T20:24:41.025] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T08:00:00 - 2017-05-21T09:00:00 tres 2 [2017-05-30T20:24:50.262] error: We have more allocated time than is possible (269253026080 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2 [2017-05-30T20:24:50.262] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T09:00:00 - 2017-05-21T10:00:00 tres 2 [2017-05-30T20:24:59.087] error: We have more allocated time than is possible (260691142920 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2 [2017-05-30T20:24:59.087] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T10:00:00 - 2017-05-21T11:00:00 tres 2 [2017-05-30T20:25:09.605] error: We have more allocated time than is possible (255667014200 > 250059600000) for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2 [2017-05-30T20:25:09.605] error: We have more time than is possible (250059600000+1267200000+0)(251326800000) > 250059600000 for cluster lcluster(69461000) from 2017-05-21T11:00:00 - 2017-05-21T12:00:00 tres 2
This will be resolved when bug 3879 is resolved. *** This ticket has been marked as a duplicate of ticket 3879 ***