| Summary: | Added SyscfgTimeout parameter to knl_generic-plugin | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Felip Moll <lipixx> |
| Component: | Contributions | Assignee: | Moe Jette <jette> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 5 - Enhancement | ||
| Priority: | --- | ||
| Version: | 17.11.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | -Other- | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 17.02.1 | Target Release: | 17.11 |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: | Patch to add the syscfgtimeout parameter to knl_generic | ||
Thank you for your contribution. I did remove the minimum value and kept the old default timeout of 1 second. I also added documentation. The commit is here: https://github.com/SchedMD/slurm/commit/32ded0c3df76e04ae8eca5d9d14e7d7354c78257 Thank you very much Moe. It's fine for me. Just a comment, I kept the minimum value just because in the knl_cray plugin there's a minimum value, and for consistency I left it this way. For me it is better removing the minimum, as you did, so maybe it would be good to modify knl_cray then. (In reply to Felip Moll from comment #3) > Thank you very much Moe. > > It's fine for me. > > Just a comment, I kept the minimum value just because in the knl_cray plugin > there's a minimum value, and for consistency I left it this way. > > For me it is better removing the minimum, as you did, so maybe it would be > good to modify knl_cray then. The Cray command to perform this function will almost certainly fail if not given more than a second to run, so that check is to prevent a configuration that would almost certainly fail. I hope the intel syscfg command is faster. |
Created attachment 4139 [details] Patch to add the syscfgtimeout parameter to knl_generic Dear SLURM developers, In Barcelona Supercomputing Center we are experiencing some timeouts with the syscfg tool on a Knights Landing infraestructure. The tool in some cases delays too much and the hardcodet timeout is not enough. I added a new parameter like the one that is in knl_cray plugin in order to let the user to specify SyscfgTimeout in knl_generic.conf. Attached you will find the patch, it is tested and seems to work. I set up a minimum time of 1000ms and a default of 5000ms. Documentation should be updated accordingly if parameter is accepted. Cheers, Felip M