| Summary: | Allow send_fd_over_pipe() to work with SOCK_STREAM sockets instead of just SOCK_DGRAM | ||
|---|---|---|---|
| Product: | Slurm | Reporter: | Aditi Gaur <agaur> |
| Component: | Other | Assignee: | Danny Auble <da> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | C - Contributions | ||
| Priority: | --- | CC: | csamuel |
| Version: | 20.11.x | ||
| Hardware: | Linux | ||
| OS: | Linux | ||
| See Also: | https://bugs.schedmd.com/show_bug.cgi?id=8993 | ||
| Site: | NERSC | Alineos Sites: | --- |
| Atos/Eviden Sites: | --- | Confidential Site: | --- |
| Coreweave sites: | --- | Cray Sites: | --- |
| DS9 clusters: | --- | HPCnow Sites: | --- |
| HPE Sites: | --- | IBM Sites: | --- |
| NOAA SIte: | --- | OCF Sites: | --- |
| Recursion Pharma Sites: | --- | SFW Sites: | --- |
| SNIC sites: | --- | Linux Distro: | --- |
| Machine Name: | CLE Version: | ||
| Version Fixed: | 20.11.5 21.08.0pre1 | Target Release: | --- |
| DevPrio: | --- | Emory-Cloud Sites: | --- |
| Attachments: |
reproducer
fix for the fd issue. |
||
Created attachment 16421 [details]
fix for the fd issue.
I also modified the comment above the functions, since SCM_RIGHTS is used, the fd's can be passed between any processes, not just parent child.
What system are you seeing the problem on? I can see that note in the man page now, but haven't seen issues with this in practice on most systems, and am curious as to what can actually expose this. I'll look into reviewing this on 20.02 instead of 20.11, it looks important enough to pull in as a bug fix. - Tim I am seeing this on my dev system which is ubuntu latest. I encountered this problem during the testing of my patch here: https://bugs.schedmd.com/show_bug.cgi?id=8993 I have not seen this issue on our production systems however. So I looked through the code to see if the bug could have been exposed somewhere else. I didnt find any evidence of that. I do see the `send_fd_over_pipe` function being used in src/slurmd/slurmd/req.c in function `_open_as_other`, but in that case the socketpair created a few lines above is created as SOCK_DGRAM. This particular issue only applies with SOCK_STREAM flag. So I dont think in existing code this is an issue as long as SOCK_DGRAM is used. Adding an extra byte however would cover both the cases. Updating the title. Aditi, I can easily reproduce the error here. Thanks for the fix. One question I have is if there a reason you have the msg.msg_iovlen different in the 2 places? msg.msg_iovlen = sizeof(c); vs msg.msg_iovlen = sizeof(iov)/sizeof(iov[0]); The code looks very similar, but figured I would ask. The values appear to be the same as well. Thanks! Outside of this I think this is the correct way to go. Hi Danny, They both do look the same. I think its probably a mistake. Either can be used. Comment on attachment 16421 [details]
fix for the fd issue.
This has been added in 20.11.5 commit 3b7660a2f49f
Please reopen if anything further is needed here. |
Created attachment 16420 [details] reproducer Hello, During the upstreaming process of a patch- I encountered a bug in src/common/fd.c in send_fd_over_pipe function as well as receieve_fd_over_pipe. The bug is in how the message is constructed. According to the unix 7 man page: "To pass file descriptors or credentials over a SOCK_STREAM, you need to send or receive at least one byte of nonancillary data in the same sendmsg(2) or recvmsg(2) call." Since both the functions do not add the nonancillary data, They are unable to pass the fd correctly. I have added a reproducer to that effect. You will have to run it manually in order to see that the fd is not passed correctly. I have also provided a patch that fixes the issue. After the patch, the fd is passed correctly. Please let me know if this is something you can reproduce on your end, hopefully its not just unique to my OS.