Created attachment 42636 [details] Patch to set correct sa_len before calling bind() Hello SchedMD team, While packaging Slurm 23.11.7 for FreeBSD, I encountered an issue where `bind()` fails with EINVAL on startup. On FreeBSD, the sa_len field in `struct sockaddr_in` / `sockaddr_in6` must be set correctly before calling `bind(2)`. If left unset, `bind()` may fail with EINVAL. For me this resulted in slurmctld not starting. This patch: - Sets sa_len appropriately based on the address family - Tracks the correct structure length for use in the `bind()` call The change is wrapped in `#if defined(__FreeBSD__)` guards so it has no effect on other platforms. **Testing:** - Built Slurm 23.11.7 from source on FreeBSD 14.3-RELEASE - Verified that `slurmctld` now binds successfully Patch is attached in `git format-patch` format. Thank you for considering this fix, and for your continued work on Slurm.
Just a gentle follow-up - please let me know if there’s anything I can do to help move this patch forward, such as additional testing on FreeBSD, rebasing against current master, or providing more details about the build environment. I’m happy to adjust as needed. Thanks for your time and review!
Thank you for supplying this patch and the additional ping, I'm really sorry its taken this long to get back to you! I'm taking a harder look at the patch today but I think it looks good, I'm currently just investigating how broadly we should apply this. It looks like OpenBSD has similar requirements so I'm thinking we may want to do this across the all the BSD Operating Systems. Thanks again! --Tim
Thanks for taking a look! I agree that generalizing this across the BSDs makes sense. I can confirm testing on FreeBSD 14.3-RELEASE, and I’d be happy to re-test or adapt if you decide to update the patch to cover other BSD variants. Please let me know if I can provide further details or assist with OpenBSD/NetBSD testing.
Since *BSDs do not provide cgroups, one possible approach I’m exploring is to replace cgroups.c with stub functions for *BSD builds. This would at least make the codebase compile and run without cgroups, while allowing us to add replacement plugins like pgid later. Would that kind of interim solution be of interest to you, or would you prefer I continue with explicit removal patches?
Created attachment 43102 [details] 25.11 v2 (In reply to Rikka Göring from comment #6) > Thanks for taking a look! > I agree that generalizing this across the BSDs makes sense. I can confirm > testing on FreeBSD 14.3-RELEASE, and I’d be happy to re-test or adapt if you > decide to update the patch to cover other BSD variants. > > Please let me know if I can provide further details or assist with > OpenBSD/NetBSD testing. This is what I came up with as a modification for the patch. Everything you had there is still there in spirit, I just found that we could manage it in one #ifdef and applied it to FreeBSD, NetBSD, and OpenBSD. Please let me know what you think of the changes, including to the commit message! If you are able to confirm this fix on any number of the BSDs that would be greatly appreciated! (In reply to Rikka Göring from comment #7) > Since *BSDs do not provide cgroups, one possible approach I’m exploring is > to replace cgroups.c with stub functions for *BSD builds. This would at > least make the codebase compile and run without cgroups, while allowing us > to add replacement plugins like pgid later. Would that kind of interim > solution be of interest to you, or would you prefer I continue with explicit > removal patches? Stubs would be of interest to us. I was looking at what we had done with the cgroup code in more recent releases and it looks to me like having the cgroup interface just no-op is the direction we were heading there anyway. Thank you for the contributions and the help with testing, I really appreciate it! --Tim
Thanks, I really like the improvements in this version, especially the unified bind_len handling and applying it consistently across the BSDs. That’s definitely cleaner than my original patch. I’ll be testing this on FreeBSD, OpenBSD, and NetBSD over the next 24–48 hours and will report back with results.
(In reply to Rikka Göring from comment #9) > Thanks, I really like the improvements in this version, especially the > unified bind_len handling and applying it consistently across the BSDs. > That’s > definitely cleaner than my original patch. > > I’ll be testing this on FreeBSD, OpenBSD, and NetBSD over the next 24–48 > hours > and will report back with results. Thank you for all of this! My testing on FreeBSD this morning appears to have worked, so it *should* also work for you! I also wanted to update you on the cgroups question further - I did some more digging on why a cgroup plugin was being loaded and found that starting with slurm 24.05 you can populate cgroup.conf with "CgroupPlugin=disabled" and all the cgroup functionality will be turned off. I hope this helps! --Tim
I was able to build and run Slurm with your patch on FreeBSD, and can confirm the socket-length issue there is resolved. I’ve also started testing on OpenBSD and NetBSD - progress is being made, but I’m still working through some minor build and environment quirks before I can give full confirmation on those platforms. I’ll report back once those tests are complete.
(In reply to Rikka Göring from comment #11) > I was able to build and run Slurm with your patch on FreeBSD, and can > confirm the socket-length issue there is resolved. Fantastic, thank you! > I’ve also started testing on OpenBSD and NetBSD - progress is being made, > but I’m still working through some minor build and environment quirks before > I can give full confirmation on those platforms. I’ll report back once those > tests are complete. Thank you! I spent some time working on an OpenBSD setup today and found a couple other things to address there too. I have an internal ticket for fixing at least one of those now too. Thanks again for your help! --Tim
I was able to test the v2 patch on both OpenBSD and NetBSD VMs as well, and can confirm that it resolves the socket-length issue there too. Sorry for the delay in getting back on this - I had to spin up some fresh test environments in between other FreeBSD HPC work. From my side this looks good across all three BSDs. Please let me know if you’d like me to re-test against a newer branch or provide anything else.
(In reply to Rikka Göring from comment #13) > I was able to test the v2 patch on both OpenBSD and NetBSD VMs as well, and > can confirm that it resolves the socket-length issue there too. > > Sorry for the delay in getting back on this - I had to spin up some fresh > test environments in between other FreeBSD HPC work. Thank you so much for the additional testing! > From my side this looks good across all three BSDs. Please let me know if > you’d like me to re-test against a newer branch or provide anything else. Mine looks good to, so I've merged this commit ahead of 25.11.0rc1, the commit can be found here - https://github.com/SchedMD/slurm/commit/d9f8f18f03 I'll close this as resolved now, but let us know if you have other questions or see other issues (I do see the other two from you and I'll check on their status). Thank you again for the report, the contribution, and the additional testing! --Tim
Thanks a lot for merging this so quickly, and for the thoughtful review process! I’m glad I could help get BSD support a bit smoother. I’ll keep testing on my side and will continue reporting or contributing fixes where I can. Much appreciated! -Rikka