Ticket 10457

Summary: How to make slurm cgroups tag job network packets with a class identifier?
Product: Slurm Reporter: Sophie Créno <sophie.creno>
Component: OtherAssignee: Felip Moll <felip.moll>
Status: RESOLVED INFOGIVEN QA Contact:
Severity: 4 - Minor Issue    
Priority: ---    
Version: 20.02.6   
Hardware: Linux   
OS: Linux   
Site: Institut Pasteur Alineos Sites: ---
Atos/Eviden Sites: --- Confidential Site: ---
Coreweave sites: --- Cray Sites: ---
DS9 clusters: --- HPCnow Sites: ---
HPE Sites: --- IBM Sites: ---
NOAA SIte: --- OCF Sites: ---
Recursion Pharma Sites: --- SFW Sites: ---
SNIC sites: --- Linux Distro: ---
Machine Name: CLE Version:
Version Fixed: Target Release: ---
DevPrio: --- Emory-Cloud Sites: ---

Description Sophie Créno 2020-12-16 04:34:40 MST
Hello,

  We would like to allow internet traffic on our cluster nodes but only to
some specific jobs (using a special gres, partition/qos combination
or whatever). For that, we have imagined that we could use the interface of
the Network classifier cgroup to tag the network packets of these jobs with
a classid that would be used to allow traffic toward the Internet through 
iptables rules 
(cf https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/net_cls.html) This way, only tagged packets would reach the Internet. We would combine
that with something like MaxTRES to limit the number of cores accessing
the Internet to avoid any harm.

  But we don't know how to do that. I don't think that any of the existing
cgroup plugins allows such a thing (I don't even know if it would be 
the right place to write that feature). Am I wrong? Is such a feature in
your road map? If not, could you give us some hints on how we could achieve
our goal?

  Thanks in advance,
Comment 1 Felip Moll 2020-12-17 09:37:56 MST
(In reply to Sophie Créno from comment #0)
> Hello,
> 
>   We would like to allow internet traffic on our cluster nodes but only to
> some specific jobs (using a special gres, partition/qos combination
> or whatever). For that, we have imagined that we could use the interface of
> the Network classifier cgroup to tag the network packets of these jobs with
> a classid that would be used to allow traffic toward the Internet through 
> iptables rules 
> (cf
> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/net_cls.html)
> This way, only tagged packets would reach the Internet. We would combine
> that with something like MaxTRES to limit the number of cores accessing
> the Internet to avoid any harm.
> 
>   But we don't know how to do that. I don't think that any of the existing
> cgroup plugins allows such a thing (I don't even know if it would be 
> the right place to write that feature). Am I wrong? Is such a feature in
> your road map? If not, could you give us some hints on how we could achieve
> our goal?
> 
>   Thanks in advance,

Your idea of using cgroups could make sense. We would need to extend the current task/cgroup plugin to also add tasks into net_cls cgroup. Then you would need to configure the nodes with tc (Traffic Control) creating qdiscs, classes and filters as needed. Finally an iptables rule would allow access to processes in a specific classid cgroup. The hierarchy with qdiscs and classes could be an aspect to think about, maybe it would be enough with a flat one.

But, this is not on our roadmap for the moment, and with the introduction of cgroups v2 in a near future this is probably not gonna be included in v1. Nevertheless I take note for studying this for v2.

Another idea I can give is:
Use a Prolog which adds a rule in the node for the user, and an epilog to remove the rule after job ends, for example:

iptables -A OUTPUT -p tcp -m tcp --dport 80 -m owner --uid-owner username -j ACCEPT

The problem is you are restricting by user, not by job, but I think it is the closest approach you have at the moment.

What do you think?
Comment 2 Sophie Créno 2020-12-17 13:04:47 MST
Hello,

  Thanks for your answer. 


> Nevertheless I take note for studying this for v2.

  Unfortunately, if I understood correctly, it's rather with eBPF that
we are supposed to do this kind of things in cgroups v2.


> Another idea I can give is:
> Use a Prolog which adds a rule in the node for the user, and an epilog
> to remove the rule after job ends, for example:
> 
> iptables -A OUTPUT -p tcp -m tcp --dport 80 -m owner --uid-owner username -j ACCEPT
> The problem is you are restricting by user, not by job, but I think
> it is the closest approach you have at the moment.

  Indeed, it's not exactly what we need and it could be harmful if
the Epilog encounters a problem but thanks for trying to provide a
workaround.


> What do you think?

  I think you have answered my question the best you could so
we can close the ticket.

  Thanks again and have a nice day,
Comment 3 Felip Moll 2020-12-18 07:46:43 MST
(In reply to Sophie Créno from comment #2)
> Hello,
> 
>   Thanks for your answer. 
> 
> 
> > Nevertheless I take note for studying this for v2.
> 
>   Unfortunately, if I understood correctly, it's rather with eBPF that
> we are supposed to do this kind of things in cgroups v2.

As far as I've been reading you are right. There's no direct equivalent of net_cls and net_prio in cgroups v2.

In theory, since kernel 4.5.x support was added in iptables to allow BPF filters that rely on cgroup v2 pathnames to allow control of network traffic. I don't know the details.

>   Indeed, it's not exactly what we need and it could be harmful if
> the Epilog encounters a problem but thanks for trying to provide a
> workaround.

Yep, that's a risk. Nevertheless exposing your compute nodes to internet for users is also a risk.

> > What do you think?
> 
>   I think you have answered my question the best you could so
> we can close the ticket.
> 
>   Thanks again and have a nice day,

Thanks Sophie.