Comment # 12
on bug 459
from Yiannis Georgiou
Hello David,
sorry for the long delay on this one. Here are the results on the tests that
you asked me
The tests were made upon 20 nodes with 240 cpus in total.
Version Average MPI_Init (sec)
Intel srun (libpmi) : 2.82
Intel mpirun : 0.28
OpenMPI srun (libpmi): 2.72
OpenMPI mpirun : 1.64
So to answer to your question, indeed the tests show that the degradation with
libpmi happens with both Intel and OpenMPI. BullxMPI is compiled with pmi2 by
default so it should not be used in the comparison.
By the way a colleague in BULL have made tests measuring the time of the Intel
srun with libmpi on a larger cluster using different values for PMI_TIME
variable and it seems that lowering this variable improves the time
significantly:
Nodes Ntasks PMI_TIME=500 PMI_TIME=10
10 40 4.485 3.696477
20 80 5.320 3.396676
100 400 11.966 2.217788
400 1600 111.757 9.523558
900 3600 551.599 30.90825
We are starting using this variable to workaround the delays.
The default value of PMI_TIME in slurm/src/api/slurm_pmi.c is 500. Do you think
we should drop this to a lower value or should we work an optimization of the
logic in _delay_rpc function which makes use of PMI_TIME?
Thanks
Yiannis