Ticket 3525

Summary: Added SyscfgTimeout parameter to knl_generic-plugin
Product: Slurm Reporter: Felip Moll <lipixx>
Component: ContributionsAssignee: Moe Jette <jette>
Status: RESOLVED FIXED QA Contact:
Severity: 5 - Enhancement    
Priority: ---    
Version: 17.11.x   
Hardware: Linux   
OS: Linux   
Site: -Other- Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: 17.02.1 Target Release: 17.11
DevPrio: --- Emory-Cloud Sites: ---
Attachments: Patch to add the syscfgtimeout parameter to knl_generic

Description Felip Moll 2017-03-02 10:27:52 MST
Created attachment 4139 [details]
Patch to add the syscfgtimeout parameter to knl_generic

Dear SLURM developers,

In Barcelona Supercomputing Center we are experiencing some timeouts with the syscfg tool on a Knights Landing infraestructure.

The tool in some cases delays too much and the hardcodet timeout is not enough.

I added a new parameter like the one that is in knl_cray plugin in order to let the user to specify SyscfgTimeout in knl_generic.conf.

Attached you will find the patch, it is tested and seems to work.

I set up a minimum time of 1000ms and a default of 5000ms.

Documentation should be updated accordingly if parameter is accepted.

Cheers,
Felip M
Comment 2 Moe Jette 2017-03-02 13:04:25 MST
Thank you for your contribution. I did remove the minimum value and kept the old default timeout of 1 second. I also added documentation. The commit is here:

https://github.com/SchedMD/slurm/commit/32ded0c3df76e04ae8eca5d9d14e7d7354c78257
Comment 3 Felip Moll 2017-03-03 00:40:43 MST
Thank you very much Moe.

It's fine for me.

Just a comment, I kept the minimum value just because in the knl_cray plugin there's a minimum value, and for consistency I left it this way.

For me it is better removing the minimum, as you did, so maybe it would be good to modify knl_cray then.
Comment 4 Moe Jette 2017-03-03 09:45:29 MST
(In reply to Felip Moll from comment #3)
> Thank you very much Moe.
> 
> It's fine for me.
> 
> Just a comment, I kept the minimum value just because in the knl_cray plugin
> there's a minimum value, and for consistency I left it this way.
> 
> For me it is better removing the minimum, as you did, so maybe it would be
> good to modify knl_cray then.

The Cray command to perform this function will almost certainly fail if not given more than a second to run, so that check is to prevent a configuration that would almost certainly fail. I hope the intel syscfg command is faster.