Created attachment 4139 [details] Patch to add the syscfgtimeout parameter to knl_generic Dear SLURM developers, In Barcelona Supercomputing Center we are experiencing some timeouts with the syscfg tool on a Knights Landing infraestructure. The tool in some cases delays too much and the hardcodet timeout is not enough. I added a new parameter like the one that is in knl_cray plugin in order to let the user to specify SyscfgTimeout in knl_generic.conf. Attached you will find the patch, it is tested and seems to work. I set up a minimum time of 1000ms and a default of 5000ms. Documentation should be updated accordingly if parameter is accepted. Cheers, Felip M
Thank you for your contribution. I did remove the minimum value and kept the old default timeout of 1 second. I also added documentation. The commit is here: https://github.com/SchedMD/slurm/commit/32ded0c3df76e04ae8eca5d9d14e7d7354c78257
Thank you very much Moe. It's fine for me. Just a comment, I kept the minimum value just because in the knl_cray plugin there's a minimum value, and for consistency I left it this way. For me it is better removing the minimum, as you did, so maybe it would be good to modify knl_cray then.
(In reply to Felip Moll from comment #3) > Thank you very much Moe. > > It's fine for me. > > Just a comment, I kept the minimum value just because in the knl_cray plugin > there's a minimum value, and for consistency I left it this way. > > For me it is better removing the minimum, as you did, so maybe it would be > good to modify knl_cray then. The Cray command to perform this function will almost certainly fail if not given more than a second to run, so that check is to prevent a configuration that would almost certainly fail. I hope the intel syscfg command is faster.