Ticket 3525 - Added SyscfgTimeout parameter to knl_generic-plugin
Summary: Added SyscfgTimeout parameter to knl_generic-plugin
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Contributions (show other tickets)
Version: 17.11.x
Hardware: Linux Linux
: 5 - Enhancement
Assignee: Moe Jette
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2017-03-02 10:27 MST by Felip Moll
Modified: 2017-03-03 09:45 MST (History)
0 users

See Also:
Site: -Other-
Slinky Site: ---
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
Google sites: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Tzag Elita Sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 17.02.1
Target Release: 17.11
DevPrio: ---
Emory-Cloud Sites: ---


Attachments
Patch to add the syscfgtimeout parameter to knl_generic (3.23 KB, patch)
2017-03-02 10:27 MST, Felip Moll
Details | Diff

Note You need to log in before you can comment on or make changes to this ticket.
Description Felip Moll 2017-03-02 10:27:52 MST
Created attachment 4139 [details]
Patch to add the syscfgtimeout parameter to knl_generic

Dear SLURM developers,

In Barcelona Supercomputing Center we are experiencing some timeouts with the syscfg tool on a Knights Landing infraestructure.

The tool in some cases delays too much and the hardcodet timeout is not enough.

I added a new parameter like the one that is in knl_cray plugin in order to let the user to specify SyscfgTimeout in knl_generic.conf.

Attached you will find the patch, it is tested and seems to work.

I set up a minimum time of 1000ms and a default of 5000ms.

Documentation should be updated accordingly if parameter is accepted.

Cheers,
Felip M
Comment 2 Moe Jette 2017-03-02 13:04:25 MST
Thank you for your contribution. I did remove the minimum value and kept the old default timeout of 1 second. I also added documentation. The commit is here:

https://github.com/SchedMD/slurm/commit/32ded0c3df76e04ae8eca5d9d14e7d7354c78257
Comment 3 Felip Moll 2017-03-03 00:40:43 MST
Thank you very much Moe.

It's fine for me.

Just a comment, I kept the minimum value just because in the knl_cray plugin there's a minimum value, and for consistency I left it this way.

For me it is better removing the minimum, as you did, so maybe it would be good to modify knl_cray then.
Comment 4 Moe Jette 2017-03-03 09:45:29 MST
(In reply to Felip Moll from comment #3)
> Thank you very much Moe.
> 
> It's fine for me.
> 
> Just a comment, I kept the minimum value just because in the knl_cray plugin
> there's a minimum value, and for consistency I left it this way.
> 
> For me it is better removing the minimum, as you did, so maybe it would be
> good to modify knl_cray then.

The Cray command to perform this function will almost certainly fail if not given more than a second to run, so that check is to prevent a configuration that would almost certainly fail. I hope the intel syscfg command is faster.