| Summary: | Where does --gpus=X flag show up in job_descriptor? | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Mikael Öhman <mikael.ohman> |
| Component: | Scheduling | Assignee: | Marcin Stolarek <cinek> |
| Status: | RESOLVED INFOGIVEN | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | cinek |
| Version: | 18.08.9 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | SNIC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | C3SE | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Mikael Öhman
2020-02-25 05:59:07 MST
Mikael, You're right, although, "gres" is still available in job_submit.lua for backword compatibility it's now a synonym of tres_per_node. Depending on the option used GPU devices should be visible under one of the folowing: - tres_per_node - tres_per_job - tres_per_socket - tres_per_task cheers, Marcin What happens if I need to modify these fields? Should i modify both versions? Each of those is directly connected with the "job description" provided by user, so setting gpu gres in tres_per_job has the meaning of --gpus, tres_per_socket is --gpus-per-socket etc. Different combinations are triggering slightly different logic on slurmctld side. If your policies are complicated you may consider using cli_filter (starting from 20.02 those will be available also with lua interface) instead. The obvious limitation of those is that they are "soft" - user tricking the client-side can submit a job that doesn't fulfill them, but the good point is that they are executed all on the user side allowing higher throughput over slurmctl. Let me know if you have more questions. cheers, Marcin Sorry, i should have clarified, should i modify both the legacy "job_desc.gres" as well as "job_dec.tres_per_node". Since they are supposed to be synonyms, i'll just put the same value in both when i get around to update our hooks. Thank you for the help, you can close this ticket now (I'm not sure what I should select for resolving this ticket) Best regards, Mikael Mikael, They are synonyms for job_submit/lua only, it's basically those lines in job_submit_lua.c: >321 } else if (!xstrcmp(name, "gres")) { >322 /* "gres" replaced by "tres_per_node" in v18.08 */ >323 lua_pushstring (L, job_ptr->tres_per_node); >813 } else if (!xstrcmp(name, "gres")) { >814 /* "gres" replaced by "tres_per_node" in v18.08 */ >815 lua_pushstring (L, job_desc->tres_per_node); >1083 } else if (!xstrcmp(name, "gres")) { >1084 /* "gres" replaced by "tres_per_node" in v18.08 */ >1085 value_str = luaL_checkstring(L, 3); >1086 xfree(job_desc->tres_per_node); >1087 if (strlen(value_str)) >1088 job_desc->tres_per_node = xstrdup(value_str); As you see "gres" is exported to lua from tres_per_node and used to set tres_per_node in job description, so I'd suggest that you use only tres_per_node instead of gres. If you were developing your own job_submit plugin in C you'd have to use tres_per_node since gres is no longer member of job_desc_msg_t. Let me know if this clarified the situation. cheers, Marcin PS. As it goes about closing you can just confirm that in a message and I can mark that as closed. For this ticket, I'm going to call it "infogiven". Mikael, I'm closing this case now with "information given" status. Should you have any additional question please don't hesitate to reopen. cheers, Marcin |