Created attachment 41270 [details] Fix Slingshot plugin 401 error handling A patch is attached to fix HPE Slingshot plugin handling of Fabric Manager token expiration. The patch's comment is below: <start> At init time, the Slurm Slingshot plugin uses a Fabric Manager (FM) login endpoint to get a token used for authn/authz for subsequent REST calls to the FM. When that token expires, the plugin is supposed to login again and get a new token. However, when the token expired, the FM was returning an HTML error string (not JSON as expected), so the response-to-JSON routine failed without trying to re-acquire the token. To fix, don't error out when we can't convert the response to JSON (or if we don't get any response to the REST call), just keep going and re-acquire the token on HTTP 401/403. Also added a way to test token expiration without relying on the FM. Tested on sawmill with MAX_CACHE_USED=3 (that corrupted the authorization header after every 3 calls) to simulate token expiration. <end> The patch has a hack that can be used to test the change without actually expiring tokens.
Can you comment on when/if this is getting fixed in the FM?
> Can you comment on when/if this is getting fixed in the FM? I wouldn't plan on it getting fixed; I believe the html error is actually coming from an nginx proxy, and I haven't had good luck asking folks to change the look of the error.