Ticket 21683 - How to get started with auth/slurm?
Summary: How to get started with auth/slurm?
Status: RESOLVED INFOGIVEN
Alias: None
Product: Slurm
Classification: Unclassified
Component: Configuration (show other tickets)
Version: 24.05.5
Hardware: Linux Linux
: 4 - Minor Issue
Assignee: Ethan Simmons
QA Contact:
URL:
Depends on:
Blocks:
 
Reported: 2024-12-18 09:33 MST by Ole.H.Nielsen@fysik.dtu.dk
Modified: 2025-01-08 13:27 MST (History)
2 users (show)

See Also:
Site: DTU Physics
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
NoveTech Sites: ---
Nvidia HWinf-CS Sites: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed:
Target Release: ---
DevPrio: ---
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this ticket.
Description Ole.H.Nielsen@fysik.dtu.dk 2024-12-18 09:33:12 MST
I would like to get started with the auth/slurm authentication plugin, and I've read the documentation in https://slurm.schedmd.com/authentication.html. However, some questions come up:

1. The item "Distribute the key file to the machines on the cluster" doesn't make sense in a Configless Slurm cluster such as ours.  According to https://slurm.schedmd.com/configless_slurm.html the slurm.key and slurm.jwks files are *not* distributed by the slurmctld.

2. If we would like to try the new 24.05 slurm.jwks file, we need to replace the "k" value (decoded as "The key above me always lies" :-) by something sensible.  Can you confirm that it's secure to use a very long string generated with one of these examples:

  $ cat /etc/slurm/slurm.key | base64 -w 0
  $ dd if=/dev/urandom count=1 bs=1024 | base64 -w 0

3. For testing auth/slurm in a cluster using auth/munge by default, I contemplate copying the /etc/slurm folder from the slurmctld server to a testing slurmd node (suitably drained), and then make local auth/slurm modifications in /etc/slurm/slurm.conf.  Would this in any way impact the operation of the rest of the cluster?  Will slurmd simply pick up the local node's auth/slurm authentication?

I don't think that these questions are covered by the latest Field Notes:
https://slurm.schedmd.com/SLUG24/Field-Notes-8.pdf

Thanks for sharing your insights,
Ole
Comment 1 Ole.H.Nielsen@fysik.dtu.dk 2024-12-19 01:56:43 MST
4. I was trying to understand these statements in https://slurm.schedmd.com/authentication.html:

"Beginning with version 24.05, you may alternatively create a slurm.jwks file with multiple keys defined.
...
If the slurm.jwks is not present or cannot be read, the cluster defaults to the slurm.key."

I don't understand the word "alternatively" here, so can you please explain the logic?  The second sentence seems to imply (or even require?) that slurm.key has already been configured, and must be available as a fallback mechanism.  But the slurm.jwks permits key rotation whereas slurm.key doesn't, so I'm confused about whether or not to use slurm.jwks together with slurm.key.

5. IMHO, the use of "dd if=/dev/random ..." to generate keys may result in too short strings, depending on the entropy in the system, especially if you run the command repeatedly.  The /dev/urandom doesn't seem to suffer from this issue.
I've found a page that discusses /dev/random vs. /dev/urandom: https://www.thomas-huehn.com/myths-about-urandom/

Question: Do you recommend the use of /dev/urandom in stead of /dev/random for generating keys?

Thanks,
Ole
Comment 2 Ethan Simmons 2024-12-20 12:22:07 MST
Those are really good questions! I'll do my best to cover all of them:

1. Is there a specific reason you've decided to go with auth/slurm? It is true that the slurm.jwks won't be shared, however this shouldn't be changing regularly while your config files will likely change several 
times over the lifetime of the cluster. Additionally, the file is the equivalent of a password. Some sites have security policies around sending such sensitive information. 

2. K should be a random long value. Using a secure source of random and selecting at least 16 bytes will get you what you're looking for. Depending on your security needs, you may need more bits.

3. For testing changes, I would recommend having a completely separate test cluster. The docs state that there will be interference with existing jobs if you change the auth type (link 1), which is another good 
reason to have this in a test cluster first. You'll have to have scheduled down time to make this change. The parameter AuthType also can't have more than 1 value, so you can't mix auth types in a cluster.

4. There are 2 different files that can be used for auth/slurm, either slurm.key or slurm.jwks. Your cluster only needs one of them, they aren't meant as redundant of each other. That line in the docs is meant 
to give insight in how slurm searches for the file to use: if there is a slurm.jwks file present use it, otherwise use the slurm.key file. 

5. We don't have official recommendations for using /dev/random vs /dev/urandom, as this has differences between operating systems and OS versions. Depending on your threat model, this could be a very important 
decision or just come down to theoretical differences. For most sites, there isn't a need to do anything extreme like using lava lamps (link 2) for their random source for their slurm key and they can blindly 
pick between random and urandom. Consult your security team if you are worried about the decision here. That article you linked was an interesting read though, I can see if our docs need to change.

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_AuthType
[2] https://blog.cloudflare.com/randomness-101-lavarand-in-production/
Comment 3 Ethan Simmons 2025-01-02 14:08:04 MST
Just following up, do you have any more questions?
Comment 4 Ole.H.Nielsen@fysik.dtu.dk 2025-01-03 06:26:20 MST
Hi Ethan,

I'm sorry for my delayed response which was due to the Christmas vacation.  Thanks for your enlightening answers:

(In reply to Ethan Simmons from comment #2)
> Those are really good questions! I'll do my best to cover all of them:
> 
> 1. Is there a specific reason you've decided to go with auth/slurm? It is
> true that the slurm.jwks won't be shared, however this shouldn't be changing
> regularly while your config files will likely change several 
> times over the lifetime of the cluster. Additionally, the file is the
> equivalent of a password. Some sites have security policies around sending
> such sensitive information. 

The auth/slurm sounds interesting because we won't have to distribute the password file from the central server as we currently do.  Also, Danny asked me if I had tried auth/slurm, so I thought I should take a look at it and document in my Slurm Wiki how to deploy it.

Your answer made me realize that auth/slurm is mutually exclusive with auth/munge, so there is no way to test this in a running cluster, and a test cluster will be needed (we don't have a test cluster at this time).

I also realize that /etc/slurm/slurm.key has similar functionality as /etc/munge/munge.key, so it should be distributed using ClusterShell or some other mechanism.

There seems to be a snag, however, for Configless Slurm clusters because the /etc/slurm folder doesn't exist on slurmd nodes.  But perhaps the solution is to create /etc/slurm/ and install only slurm.key into that folder while using Configless for slurm.conf and other *.conf files?

The new slurm.jwks file sounds attractive since it can contain multiple keys that may be updated over time.  I wonder if updating slurm.jwks can break running slurmds?  The updating process isn't obvious.

> 2. K should be a random long value. Using a secure source of random and
> selecting at least 16 bytes will get you what you're looking for. Depending
> on your security needs, you may need more bits.

IMHO we need the slurm.key to have >>16 bytes of randomness, probably 1024 bits or more which gives a key of 1368 bytes.

> 3. For testing changes, I would recommend having a completely separate test
> cluster. The docs state that there will be interference with existing jobs
> if you change the auth type (link 1), which is another good 
> reason to have this in a test cluster first. You'll have to have scheduled
> down time to make this change. The parameter AuthType also can't have more
> than 1 value, so you can't mix auth types in a cluster.

Thanks a lot for clarifying this crucial information, point well taken.

> 4. There are 2 different files that can be used for auth/slurm, either
> slurm.key or slurm.jwks. Your cluster only needs one of them, they aren't
> meant as redundant of each other. That line in the docs is meant 
> to give insight in how slurm searches for the file to use: if there is a
> slurm.jwks file present use it, otherwise use the slurm.key file. 

I don't understand why this fallback is useful, because all nodes in the cluster are required to agree on one and only one key file.  So if slurm.jwks happens to be missing on some nodes, there is no point in using a slurm.key file, correct?

> 5. We don't have official recommendations for using /dev/random vs
> /dev/urandom, as this has differences between operating systems and OS
> versions. Depending on your threat model, this could be a very important 
> decision or just come down to theoretical differences. For most sites, there
> isn't a need to do anything extreme like using lava lamps (link 2) for their
> random source for their slurm key and they can blindly 
> pick between random and urandom. Consult your security team if you are
> worried about the decision here. That article you linked was an interesting
> read though, I can see if our docs need to change.

OK, I was only thinking about Linux (and here we use Rocky 8).  Let's see if you can come up with more detailed advise later on.

Best regards,
Ole
Comment 5 Ethan Simmons 2025-01-06 18:05:02 MST
> "There seems to be a snag, however, for Configless Slurm clusters because the /etc/slurm folder doesn't exist on slurmd nodes.  But 
> perhaps the solution is to create /etc/slurm/ and install only slurm.key into that folder while using Configless for slurm.conf and other 
> *.conf files?"
Yes, create the /etc/slurm directory and make sure the key file is on all nodes. You can also set an alternative location via 
AuthAltParameters, if you didn't want to have the key file in /etc/slurm/ (link 1).


> "The new slurm.jwks file sounds attractive since it can contain multiple keys that may be updated over time.  I wonder if updating 
> slurm.jwks can break running slurmds?  The updating process isn't obvious."
In the case you would need to do a key rotation, change the active key in the slurm.jwks file then do a reconfigure. See the 'use' parameter 
for setting the active key (link 2). As per our docs:
> "The slurm.jwks file aids with key rotation, as the cluster does not need to be restarted at once when a key is rotated. Instead, an 
> scontrol reconfigure is sufficient."
You would just need to run scontrol reconfigure (link 3).


16 bytes gives you a 128 bit key, which is plenty good for most systems. If you want to have a 256 bit key (32 bytes) or 1024 bit key (128 
bytes), then you would just need to increase the byte value in the dd command. Feel free to set this to match your security requirements.

Correct, there is no need to have a slurm.key file if you intend to use slurm.jwks only.

For your system, use /dev/random.

[1] https://slurm.schedmd.com/slurm.conf.html#OPT_jwks%3D
[2] https://slurm.schedmd.com/authentication.html#multiple_key_setup
[3] https://slurm.schedmd.com/authentication.html#slurm
Comment 6 Ole.H.Nielsen@fysik.dtu.dk 2025-01-08 08:23:48 MST
(In reply to Ethan Simmons from comment #5)
> > "There seems to be a snag, however, for Configless Slurm clusters because the /etc/slurm folder doesn't exist on slurmd nodes.  But 
> > perhaps the solution is to create /etc/slurm/ and install only slurm.key into that folder while using Configless for slurm.conf and other 
> > *.conf files?"
> Yes, create the /etc/slurm directory and make sure the key file is on all
> nodes. You can also set an alternative location via 
> AuthAltParameters, if you didn't want to have the key file in /etc/slurm/
> (link 1).

Thanks for confirming the use of /etc/slurm/slurm.jwks also on Configless nodes.

The AuthAltParameters is very interesting since it's similar to the traditional use of /etc/munge/munge.key in a separate directory.

> > "The new slurm.jwks file sounds attractive since it can contain multiple keys that may be updated over time.  I wonder if updating 
> > slurm.jwks can break running slurmds?  The updating process isn't obvious."
> In the case you would need to do a key rotation, change the active key in
> the slurm.jwks file then do a reconfigure. See the 'use' parameter 
> for setting the active key (link 2). As per our docs:
> > "The slurm.jwks file aids with key rotation, as the cluster does not need to be restarted at once when a key is rotated. Instead, an 
> > scontrol reconfigure is sufficient."
> You would just need to run scontrol reconfigure (link 3).

Thanks, sounds good.

Now we only need to figure out how to populate the slurm.jwks file.

> 16 bytes gives you a 128 bit key, which is plenty good for most systems. If
> you want to have a 256 bit key (32 bytes) or 1024 bit key (128 
> bytes), then you would just need to increase the byte value in the dd
> command. Feel free to set this to match your security requirements.
> 
> Correct, there is no need to have a slurm.key file if you intend to use
> slurm.jwks only.
> 
> For your system, use /dev/random.

No, I disagree about using /dev/random because its entropy gets exhausted really quickly.  See how I eventually get as little as 8 bytes of data with repetitive use:

$ dd if=/dev/random count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
156
$ dd if=/dev/random count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
108
$ dd if=/dev/random count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
104
$ dd if=/dev/random count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
44
$ dd if=/dev/random count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
8

For comparison the /dev/urandom consistently gives a large random key with 1368 bytes:

$ dd if=/dev/urandom count=1 bs=1024 2>/dev/null  | base64 -w 0 | wc -c
1368

So I think we should use /dev/urandom to give us a 1368 byte key.  Would there be any reason to use a shorter key than our Munge key generated similarly by:

$ dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

Thanks,
Ole
Comment 7 Ethan Simmons 2025-01-08 10:46:46 MST
> "No, I disagree about using /dev/random because its entropy gets exhausted really quickly."
I recommended /dev/random for your specific use because you would only need to generate 1 key, repetitive use wouldn't come up. It would also be closer to a "truer" random, instead of a computed random via the 
pseudorandom number generator that urandom provides. That may not be something you need or value. Other use cases and requirements will change which to use as a random source.

> "Would there be any reason to use a shorter key than our Munge key"
That again depends on your security requirements for symmetric keys. If you are worried about a brute force attack against keys, a 128 bit key is currently impractical to brute force. If you are worried about a 
quantum computer brute force, 256 will be impractical to brute force. If you are worried about the key itself being leaked, length doesn't matter and key rotation instead matters. If you want extra tolerance for 
more sophisticated attacks, doubling the key size can help. Etc. Larger keys lead to more overhead when authenticating messages, so keeping keys smaller has performance benefits. 

My recommendations are not requirements, and you can change how your key is setup to meet your own security requirements. Again, the difference between /dev/random and /dev/urandom isn't very significant for 99% 
of users, and your security team will tell you if there is a meaningful difference for your use case and/or organization. Key size depends on what attacks you are expecting, and the resources behind those 
attacks. Again, your security team will be able to provide more authoritative numbers as they are more intimately aware of your security needs. If you would like further clarification or to ask any more 
questions, please let me know.
Comment 8 Ole.H.Nielsen@fysik.dtu.dk 2025-01-08 12:03:15 MST
Hi Ethan,

Thanks for your very detailed answers.  I think the auth/slurm configuration has been clarified a lot.  Please close this ticket now.

Best regards,
Ole
Comment 9 Ethan Simmons 2025-01-08 13:27:33 MST
I'm glad this was helpful! Closing