Ticket 12190

Summary: OpenAPI REST Python bindings and type issues
Product: Slurm Reporter: Simon Feltman <simon.feltman>
Component: slurmrestdAssignee: Nate Rini <nate>
Status: RESOLVED FIXED QA Contact: Ben Roberts <ben>
Severity: 4 - Minor Issue    
Priority: --- CC: albert.gil, ben, cinek
Version: 21.08.x   
Hardware: Linux   
OS: Linux   
See Also: https://bugs.schedmd.com/show_bug.cgi?id=9131
https://bugs.schedmd.com/show_bug.cgi?id=11963
Site: InterDigital Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 21.08.2
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Python test

Description Simon Feltman 2021-08-02 15:08:24 MDT
Created attachment 20641 [details]
Python test

$ slurmrestd -V
slurm 20.11.8

Right off the bat, you cannot generate Python bindings from the 20.11.7 or 20.11.8 versions (I haven't tried earlier versions):

  docker run --rm \
    --volume ${PWD}:/local \
    --workdir=/local \
    openapitools/openapi-generator-cli:v5.2.0 \
    generate \
    --input-spec /local/slurm-api.json \
    --output /local/py_api_client\
    --generator-name python \
    --package-name slurm_urllib3

   ...

  Exception: Unexpected End Of Expression. Didn't find closing ']'
'_num>[@<si'
      ^

The problem is with the "pattern" of signal when generating Python API:

            "signal": {
             "pattern": "[B:]<sig_num>[@<sig_time>]",
              "type": "string",
              "description": "When a job is within sig_time seconds of its end time, send it the signal sig_num."
            },

If I remove both instances where this pattern is used, I can generate Python bindings but they do not function correctly. A simple test of listing jobs returns type errors (see attached test.py):

    slurm_urllib3.exceptions.ApiTypeError: Invalid type for variable 'accrue_time'. Required value type is str and passed type was int at ['received_data']['jobs'][0]['accrue_time']

As I fix these manually in the .json and regenerate bindings (changing accrue_time of v0.0.36_job_response_properties from string to integer), more errors continue to come up:

    slurm_urllib3.exceptions.ApiTypeError: Invalid type for variable 'array_job_id'. Required value type is str and passed type was int at ['received_data']['jobs'][0]['array_job_id']

I ran into a lot more of these issues until I finally hit a larger type problem with a list/object mismatch in the returned data vs. the OpenAPI json.
Comment 1 Nate Rini 2021-08-02 15:45:22 MDT
Which OpenAPI generator is in use and what version?
Comment 2 Simon Feltman 2021-08-02 15:49:17 MDT
I had attempted a few different versions but all seems to have at least the first issue. The last version I tried was v5.2.0 (docker container openapitools/openapi-generator-cli:v5.2.0) as shown in my instructions.
Comment 3 Nate Rini 2021-08-02 15:58:53 MDT
OpenAPI generator v5 compatibility is being worked on in Bug#11962. Please try a v4.x version.
Comment 4 Simon Feltman 2021-08-02 16:18:24 MDT
I am not able to access the referenced bug (Bug#11962).

I was able to re-generate the bindings using v4.3.1 (docker tag openapitools/openapi-generator-cli:v4.3.1) and it created the bindings without the first error I reported (Unexpected End Of Expression .. '_num>[@<si' ..)

The bindings generated with this version now always give me an error "HTTP response body: Authentication failure" when attempting to use the attached script.


I've confirmed the REST API works with curl and the my env vars/token outside of generated bindings:

   curl --silent --show-error \
       --header "X-SLURM-USER-NAME: ${USER}" \
       --header "X-SLURM-USER-TOKEN: ${SLURM_JWT}" \
       http://${SLURM_CONTROLLER}:9999/slurm/v0.0.36/jobs
Comment 5 Nate Rini 2021-08-02 16:35:20 MDT
(In reply to Simon Feltman from comment #4)
> I am not able to access the referenced bug (Bug#11962).
I'll have to look into making that ticket public but those changes are all targeted for v0.0.37 (slurm-21.08) of the plugin due to our strict backward compatibility rules.

> The bindings generated with this version now always give me an error "HTTP
> response body: Authentication failure" when attempting to use the attached
> script.

How is the authentication being filled out in python?

We provide an example in our Docker training cluster:
> https://gitlab.com/SchedMD/training/docker-scale-out/-/blob/master/scaleout/lab_scripts/job.py#L9-15
Comment 6 Simon Feltman 2021-08-02 16:42:19 MDT
My test script was developed from the README generated with v5.2.0 which seems incompatible (setting the user and api after the configuration object is created). I updated to the pass these as you point out and it worked.

Thanks for the help
Comment 7 Nate Rini 2021-08-02 16:45:31 MDT
Simon,

I would also like to note that we have pretty strict guidance for SEV (severity) levels in our support system as outlined here:
> https://www.schedmd.com/support.php
specifically:
> Severity 1 — Major Impact
> 
> A Severity 1 issue occurs when there is a continued system outage that affects a large set of end users. The system is down and non-functional due to Slurm problem(s) and no procedural workaround exists.
> Severity 2 — High Impact
> 
> A Severity 2 issue is a high-impact problem that is causing sporadic outages or is consistently encountered by end users with adverse impact to end user interaction with the system.
> Severity 3 — Medium Impact
> 
> A Severity 3 issue is a medium-to-low impact problem that includes partial non-critical loss of system access or which impairs some operations on the system but allows the end user to continue to function on the system with workarounds.
> Severity 4 — Minor Issues
> 
> A Severity 4 issue is a minor issue with limited or no loss in functionality within the customer environment. Severity 4 issues may also be used for recommendations for future product enhancements or modifications.

Based on the description so far, I'm reducing this ticket to SEV4 as this is an issue developing a new tool or (Slurm) cluster installation. We generally reserve SEV1 for the worst issues where no jobs are running (or starting) and the site needs immediate help. In those cases, generally, any member of support may respond to help get the site back online ASAP.

Thanks,
--Nate
Comment 8 Nate Rini 2021-08-02 16:50:17 MDT
(In reply to Simon Feltman from comment #6)
> My test script was developed from the README generated with v5.2.0 which
> seems incompatible (setting the user and api after the configuration object
> is created). I updated to the pass these as you point out and it worked.

Has your site had training yet? This should be covered in it.
 
> Thanks for the help

There are breaking compatibility issues with OpenAPI generator-generated code between v4 and v5. It is unclear at this time if they are intentional. I suggest opening a ticket with OpenAPI Generator developers directly:
> https://github.com/OpenAPITools/openapi-generator/issues

The OpenAPI standard itself is pretty stable at this time which is what Slurm attempts to strictly follow:
> https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md

Are there any more questions?
Comment 9 Simon Feltman 2021-08-02 17:21:17 MDT
Nate,

> I would also like to note that we have pretty strict guidance for SEV

Thanks for the note. I'll make sure to reference this in the future.

> Has your site had training yet? This should be covered in it

Yes, although the issues for me were related to the tooling version. Unless I missed it, it might be worth a mention in this document: https://slurm.schedmd.com/rest.html
Comment 20 Nate Rini 2021-09-17 14:15:02 MDT
(In reply to Simon Feltman from comment #9)
> Yes, although the issues for me were related to the tooling version. Unless
> I missed it, it might be worth a mention in this document:
> https://slurm.schedmd.com/rest.html

A good number of updates to the above doc have been merged upstream and will be available with Slurm 21.08.2.

I'm going to close this ticket now. Please reply if you have any further questions.