Ticket 23388

Summary: FreeBSD: Fix sa_len handling in slurm_protocol_socket and add missing statfs include in cgroup.c
Product: Slurm Reporter: Rikka Göring <rikka.goering>
Component: slurmctldAssignee: Tim McMullan <mcmullan>
Status: RESOLVED FIXED QA Contact:
Severity: C - Contributions    
Priority: ---    
Version: 23.11.7   
Hardware: Other   
OS: Other   
Site: -Other- Slinky Site: ---
Alineos Sites: --- Atos/Eviden Sites: ---
Confidential Site: --- Coreweave sites: ---
Cray Sites: --- DS9 clusters: ---
Google sites: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- NoveTech Sites: ---
Nvidia HWinf-CS Sites: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Tzag Elita Sites: ---
Linux Distro: --- Machine Name:
CLE Version: Version Fixed: 25.11.0rc1
Target Release: --- DevPrio: ---
Emory-Cloud Sites: ---
Attachments: Patch to set correct sa_len before calling bind()
25.11 v2

Description Rikka Göring 2025-08-04 14:05:28 MDT
Created attachment 42636 [details]
Patch to set correct sa_len before calling bind()

Hello SchedMD team,

While packaging Slurm 23.11.7 for FreeBSD, I encountered an issue where
`bind()` fails with EINVAL on startup.

On FreeBSD, the sa_len field in `struct sockaddr_in` /
`sockaddr_in6` must be set correctly before calling `bind(2)`.
If left unset, `bind()` may fail with EINVAL.
For me this resulted in slurmctld not starting.

This patch:
- Sets sa_len appropriately based on the address family
- Tracks the correct structure length for use in the `bind()` call

The change is wrapped in `#if defined(__FreeBSD__)` guards so it has
no effect on other platforms.

**Testing:**
- Built Slurm 23.11.7 from source on FreeBSD 14.3-RELEASE
- Verified that `slurmctld` now binds successfully

Patch is attached in `git format-patch` format.

Thank you for considering this fix, and for your continued work on Slurm.
Comment 2 Rikka Göring 2025-09-13 18:28:50 MDT
Just a gentle follow-up - please let me know if there’s anything I can do to help move this patch forward, such as additional testing on FreeBSD, rebasing against current master, or providing more details about the build environment.

I’m happy to adjust as needed. Thanks for your time and review!
Comment 3 Tim McMullan 2025-09-16 10:06:59 MDT
Thank you for supplying this patch and the additional ping, I'm really sorry its taken this long to get back to you!

I'm taking a harder look at the patch today but I think it looks good, I'm currently just investigating how broadly we should apply this.  It looks like OpenBSD has similar requirements so I'm thinking we may want to do this across the all the BSD Operating Systems.

Thanks again!
--Tim
Comment 6 Rikka Göring 2025-09-17 09:06:35 MDT
Thanks for taking a look!  
I agree that generalizing this across the BSDs makes sense. I can confirm testing on FreeBSD 14.3-RELEASE, and I’d be happy to re-test or adapt if you decide to update the patch to cover other BSD variants.  

Please let me know if I can provide further details or assist with OpenBSD/NetBSD testing.
Comment 7 Rikka Göring 2025-09-17 14:58:59 MDT
Since *BSDs do not provide cgroups, one possible approach I’m exploring is to replace cgroups.c with stub functions for *BSD builds. This would at least make the codebase compile and run without cgroups, while allowing us to add replacement plugins like pgid later. Would that kind of interim solution be of interest to you, or would you prefer I continue with explicit removal patches?
Comment 8 Tim McMullan 2025-09-17 16:02:45 MDT
Created attachment 43102 [details]
25.11 v2

(In reply to Rikka Göring from comment #6)
> Thanks for taking a look!  
> I agree that generalizing this across the BSDs makes sense. I can confirm
> testing on FreeBSD 14.3-RELEASE, and I’d be happy to re-test or adapt if you
> decide to update the patch to cover other BSD variants.  
> 
> Please let me know if I can provide further details or assist with
> OpenBSD/NetBSD testing.

This is what I came up with as a modification for the patch.  Everything you had there is still there in spirit, I just found that we could manage it in one #ifdef and applied it to FreeBSD, NetBSD, and OpenBSD.  Please let me know what you think of the changes, including to the commit message!

If you are able to confirm this fix on any number of the BSDs that would be greatly appreciated!

(In reply to Rikka Göring from comment #7)
> Since *BSDs do not provide cgroups, one possible approach I’m exploring is
> to replace cgroups.c with stub functions for *BSD builds. This would at
> least make the codebase compile and run without cgroups, while allowing us
> to add replacement plugins like pgid later. Would that kind of interim
> solution be of interest to you, or would you prefer I continue with explicit
> removal patches?

Stubs would be of interest to us.  I was looking at what we had done with the cgroup code in more recent releases and it looks to me like having the cgroup interface just no-op is the direction we were heading there anyway.

Thank you for the contributions and the help with testing, I really appreciate it!
--Tim
Comment 9 Rikka Göring 2025-09-17 17:28:32 MDT
Thanks, I really like the improvements in this version, especially the
unified bind_len handling and applying it consistently across the BSDs. That’s
definitely cleaner than my original patch.

I’ll be testing this on FreeBSD, OpenBSD, and NetBSD over the next 24–48 hours
and will report back with results.
Comment 10 Tim McMullan 2025-09-18 06:44:51 MDT
(In reply to Rikka Göring from comment #9)
> Thanks, I really like the improvements in this version, especially the
> unified bind_len handling and applying it consistently across the BSDs.
> That’s
> definitely cleaner than my original patch.
>
> I’ll be testing this on FreeBSD, OpenBSD, and NetBSD over the next 24–48
> hours
> and will report back with results.

Thank you for all of this!  My testing on FreeBSD this morning appears to have worked, so it *should* also work for you!

I also wanted to update you on the cgroups question further - I did some more digging on why a cgroup plugin was being loaded and found that starting with slurm 24.05 you can populate cgroup.conf with "CgroupPlugin=disabled" and all the cgroup functionality will be turned off.

I hope this helps!
--Tim
Comment 11 Rikka Göring 2025-09-18 18:08:19 MDT
I was able to build and run Slurm with your patch on FreeBSD, and can confirm the socket-length issue there is resolved.

I’ve also started testing on OpenBSD and NetBSD - progress is being made, but I’m still working through some minor build and environment quirks before I can give full confirmation on those platforms. I’ll report back once those tests are complete.
Comment 12 Tim McMullan 2025-09-18 18:17:41 MDT
(In reply to Rikka Göring from comment #11)
> I was able to build and run Slurm with your patch on FreeBSD, and can
> confirm the socket-length issue there is resolved.

Fantastic, thank you!

> I’ve also started testing on OpenBSD and NetBSD - progress is being made,
> but I’m still working through some minor build and environment quirks before
> I can give full confirmation on those platforms. I’ll report back once those
> tests are complete.

Thank you!  I spent some time working on an OpenBSD setup today and found a couple other things to address there too.  I have an internal ticket for fixing at least one of those now too.

Thanks again for your help!
--Tim
Comment 13 Rikka Göring 2025-09-25 02:42:39 MDT
I was able to test the v2 patch on both OpenBSD and NetBSD VMs as well, and can confirm that it resolves the socket-length issue there too.

Sorry for the delay in getting back on this - I had to spin up some fresh test environments in between other FreeBSD HPC work.

From my side this looks good across all three BSDs. Please let me know if you’d like me to re-test against a newer branch or provide anything else.
Comment 14 Tim McMullan 2025-09-25 07:51:34 MDT
(In reply to Rikka Göring from comment #13)
> I was able to test the v2 patch on both OpenBSD and NetBSD VMs as well, and
> can confirm that it resolves the socket-length issue there too.
>
> Sorry for the delay in getting back on this - I had to spin up some fresh
> test environments in between other FreeBSD HPC work.

Thank you so much for the additional testing!

> From my side this looks good across all three BSDs. Please let me know if
> you’d like me to re-test against a newer branch or provide anything else.

Mine looks good to, so I've merged this commit ahead of 25.11.0rc1, the commit can be found here - https://github.com/SchedMD/slurm/commit/d9f8f18f03

I'll close this as resolved now, but let us know if you have other questions or see other issues (I do see the other two from you and I'll check on their status).

Thank you again for the report, the contribution, and the additional testing!
--Tim
Comment 15 Rikka Göring 2025-09-25 10:03:55 MDT
Thanks a lot for merging this so quickly, and for the thoughtful review process! I’m glad I could help get BSD support a bit smoother. I’ll keep testing on my side and will continue reporting or contributing fixes where I can.

Much appreciated!
-Rikka