| Summary: | slurmrestd association/qos grp/max | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Matt Ezell <ezellma> |
| Component: | slurmrestd | Assignee: | Nate Rini <nate> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | 3 - Medium Impact | ||
| Priority: | --- | CC: | barlowat |
| Version: | 20.11.8 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | ORNL-OLCF | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 21.08.6,22.05pre1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Ticket Depends on: | |||
| Ticket Blocks: | 12753 | ||
| Attachments: |
patch for 21.08
patch for 21.08 (v2) |
||
|
Description
Matt Ezell
2021-11-01 13:04:48 MDT
(In reply to Matt Ezell from comment #0) > ORNL is developing a tool to sync our account system with slurmdbd using > slurmrestd. There seems to be some inconsistency with "group" and "max" > limits on associations and qos. > > For associations, parse.c has: > _add_parse(UINT32, grp_jobs, "max/jobs/per/count") > _add_parse(UINT32, grp_jobs_accrue, "max/jobs/per/accruing") > _add_parse(UINT32, grp_submit_jobs, "max/jobs/per/submitted") > _add_parse(TRES_LIST, grp_tres, "max/tres/total") > _add_parse(UINT32, grp_wall, "max/per/account/wall_clock") > > This appears to set the "grp" limits into the "max" URL. I don't think the > actual max values are accessible. There is no grp url present, though I > believe there likely should be. > > For qos, parse.c has: > _add_parse(UINT32, grp_jobs_accrue, "limits/max/active_jobs/accruing") > _add_parse(UINT32, grp_jobs, "limits/max/active_jobs/count") > _add_parse(TRES_LIST, grp_tres, "limits/max/tres/total") > _add_parse(TRES_LIST, grp_tres_run_mins, "limits/max/tres/minutes/per/qos") > > Again, I would think these should be under "grp" to be consistent with > naming elsewhere. The hierarchy was set somewhat arbitrary based on what I thought logically break out the easiest for users. Can you give an example of your current output? Are you suggesting: > _add_parse(UINT32, grp_jobs, "max/jobs/per/count") to > > _add_parse(UINT32, grp_jobs, "limits/max/jobs/per/count") instead? > It appears that the association's endpoint docs in Slurm version 20.11.8 are > missing key pairs for “count”, “accruing”, and “submitted”, so if this is > the correct way to access these values I think a documentation update is > also in order. I suspect this is just a naming issue as slurmdb_assoc_usage_t is in the 'usage' branch of 'dbv0.0.38_association' object: > accruing -> accrue_job_count > submitted -> job_count > count -> active_jobs > Are you suggesting No, slurmdb_assoc_rec_t has the following fields: grp_jobs grp_jobs_accrue grp_submit_jobs grp_tres grp_wall max_jobs max_jobs_accrue max_submit_jobs max_wall_pj "grp" and "max" are different. I'm suggesting: > _add_parse(UINT32, grp_jobs, "max/jobs/per/count") to > _add_parse(UINT32, grp_jobs, "grp/jobs/per/count") > _add_parse(UINT32, max_jobs, "max/jobs/per/count") we need to be able to control both independently and correctly. (In reply to Matt Ezell from comment #2) > we need to be able to control both independently and correctly. I'm going to work on a patch set to get the struct and the parser back into sync. For whatever reason, looks like a few struct members didn't get added. Created attachment 22064 [details]
patch for 21.08
Please try this patch. It should have all the missing fields in slurmdb_assoc_rec_t.
(In reply to Nate Rini from comment #7) > Created attachment 22064 [details] > patch for 21.08 > > Please try this patch. It should have all the missing fields in > slurmdb_assoc_rec_t. Thanks. I will say that having GrpJobsAccrue = max/jobs/per/accruing MaxJobsAccrue = max/accrue/per/total is not straightforward at all. I will also note that qos has the same issue (paths with max that bind to the grp variable, no access to the max variable). Do you have a better naming scheme in mind? I'm open to suggestions. (In reply to Nate Rini from comment #9) > Do you have a better naming scheme in mind? I'm open to suggestions. One option is to avoid coming up with a scheme at all and just bind the name used in the Cli to the association instead of trying to split it into subpaths GrpJobsAccrue = GrpJobsAccrue MaxJobsAccrue = MaxJobsAccrue If you prefer to split it out into subpaths, I would just recommend something semi-consistent between the two similar limits. Created attachment 22147 [details] patch for 21.08 (v2) (In reply to Matt Ezell from comment #10) > One option is to avoid coming up with a scheme at all and just bind the name > used in the Cli to the association instead of trying to split it into > subpaths > If you prefer to split it out into subpaths, I would just recommend > something semi-consistent between the two similar limits. It's too late for 21.08 to change the naming schema, so I moved around the new fields something that I hope makes it easier to understand. While several fields got listed as moved in the rest release notes, a new field took the place of the original which I assume will work as originally intended by users given the field name. If the naming schema is troublesome or confusing, please open an RFE ticket and we can look at moving fields around for 22.05. Please give this patch set a try. It should sync both QOS and association to the internal structs to avoid missing fields. The NEWS entries in the patchset will likely need to be ignored for testing on <=21.08.3. Matt,
The patches for this issue are upstream for the upcoming 21.08.6 release:
> 2f29b40754 openapi/dbv0.0.37 - add missing fields to "dbv0.0.37_qos"
> f1b629d1f4 openapi/dbv0.0.37 - Rename incorrectly named fields in dbv0.0.37_qos
> 08a4f49ac9 openapi/dbv0.0.37 - add missing field to "dbv0.0.37_association"
Please feel free to test it.
Thanks,
--Nate
Closing this ticket. Please report if there are any more issues after 21.08.6 and we can continue debugging. |