| Summary: | RFC: SlurmUser check in client commands | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Kilian Cavalotti <kilian> |
| Component: | User Commands | Assignee: | Carlos Tripiana Montes <tripiana> |
| Status: | OPEN --- | QA Contact: | |
| Severity: | 4 - Minor Issue | ||
| Priority: | --- | CC: | nate, tim |
| Version: | 22.05.3 | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| Site: | Stanford | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | Target Release: | --- | |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
|
Description
Kilian Cavalotti
2022-09-22 09:56:34 MDT
Hi Kilian, That's old code. It used for all Slurm components and for some of them it's needed such check whereas in others don't. It's a simple check, done as name to UID resolution against the system accounts. Yes, no check between commands and slurmctld. This code is also used by daemons so such check wouldn't make much sense. I know, this check also doesn't make much sense in the client side. Let me think what it could be done. Cheers, Carlos. Hi Carlos, (In reply to Carlos Tripiana Montes from comment #1) > That's old code. It used for all Slurm components and for some of them it's > needed such check whereas in others don't. > > It's a simple check, done as name to UID resolution against the system > accounts. > > Yes, no check between commands and slurmctld. This code is also used by > daemons so such check wouldn't make much sense. > > I know, this check also doesn't make much sense in the client side. That was my impression as well, thanks for confirming! > Let me think what it could be done. I guess the first step would be to have a specific list of cases where that check is needed. I don't know if it's feasible to identify the command that's calling init_slurm_conf(), and then ignore the SlurmUser uid existence check for client commands that don't require it? But maybe a better solution would be to move that check out of the _validate_and_set_defaults() function and place it in functions where it's really required: there are already similar gid-side checks in _become_slurm_user() in src/slurmdbd/slurmdbd.c and src/slurmctld/controller.c. Maybe the user existence should be checked there as well. And where setuid/setgid calls to SlurmUser are done. Actually, those setuid/setgid calls will already return an error if the SlurmUser doesn't exist, so checking the existence of that user while parsing the configuration file seems a bit redundant. Anyway, this is one of the most common causes of error and confusion when trying to execute Slurm commands from a container, and one can find plenty of reports besides bug 9282: https://github.com/apptainer/singularity/issues/5345 https://computing.ee.ethz.ch/Services/SLURM#Invalid_user_for_SlurmUser_slurm https://lists.mpich.org/pipermail/discuss/2018-September/002399.html Even a Debian packaging issue: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783663 So it's probably worth considering dropping that check when parsing the configuration file, and moving it to the relevant functions. Thanks! -- Kilian Hi Killian, I've followed all the implications that this name to UID resolution have. It's mostly what we already talked about, but there's another important shard to take into account. Some client commands, those which open an active communication with slurmctld, like salloc, sattach, srun, ... Those *really* need to know that the sender's is who claims to be. slurmctld is ran by SlurmUser, so we need to be able to check that replies to RPCs are sent by it, from the client command perspective. Other than that, we can mostly ignore if the name to UID resolution fails for SlurmUser on a particular node, if we just fallback to slurm_conf.slurm_user_id=0. We are on our way to address such limitation, and we're developing some POC. I take this opportunity to apologise for not having replied to you earlier. Cheers, Carlos. Hi Carlos, Thanks for the follow up! (In reply to Carlos Tripiana Montes from comment #12) > It's mostly what we already talked about, but there's another important > shard to take into account. Some client commands, those which open an active > communication with slurmctld, like salloc, sattach, srun, ... Those *really* > need to know that the sender's is who claims to be. slurmctld is ran by > SlurmUser, so we need to be able to check that replies to RPCs are sent by > it, from the client command perspective. I see, that makes sense. > Other than that, we can mostly ignore if the name to UID resolution fails > for SlurmUser on a particular node, if we just fallback to > slurm_conf.slurm_user_id=0. > > We are on our way to address such limitation, and we're developing some POC. Excellent, thanks for letting me know! > I take this opportunity to apologise for not having replied to you earlier. No worries at all! Cheers, -- Kilian |