Enable eBPF on an existing cluster
Big picture
Enable the eBPF dataplane on an existing cluster.
Value
The eBPF dataplane mode has several advantages over standard Linux networking pipeline mode:
-
It scales to higher throughput.
-
It uses less CPU per GBit.
-
It has native support for Kubernetes services (without needing kube-proxy) that:
- Reduces first packet latency for packets to services.
- Preserves external client source IP addresses all the way to the pod.
- Supports DSR (Direct Server Return) for more efficient service routing.
- Uses less CPU than kube-proxy to keep the dataplane in sync.
To learn more and see performance metrics from our test environment, see the blog, Introducing the Calico eBPF dataplane.
Concepts
eBPF
eBPF (or "extended Berkeley Packet Filter"), is a technology that allows safe mini programs to be attached to various low-level hooks in the Linux kernel. eBPF has a wide variety of uses, including networking, security, and tracing. You’ll see a lot of non-networking projects leveraging eBPF, but for Calico Enterprise our focus is on networking, and in particular, pushing the networking capabilities of the latest Linux kernels to the limit.
Before you begin
Required
How to
- Verify that your cluster is ready for eBPF mode
- Configure Calico Enterprise to talk directly to the API server
- Configure kube-proxy
- Enable eBPF mode
- Try out DSR mode
- Reversing the process
Verify that your cluster is ready for eBPF mode
This section explains how to make sure your cluster is suitable for eBPF mode.
To check that the kernel on a node is suitable, you can run
uname -rv
The output should look like this:
5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
In this case the kernel version is v5.4, which is suitable.
On Red Hat-derived distributions, you may see something like this:
4.18.0-193.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com)
Since the Red Hat kernel is v4.18 with at least build number 193, this kernel is suitable.
Configure Calico Enterprise to talk directly to the API server
In eBPF mode, Calico Enterprise implements Kubernetes service networking directly (rather than relying on kube-proxy
).
Of course, this makes it highly desirable to disable kube-proxy
when running in eBPF mode to save resources
and avoid confusion over which component is handling services.
To be able to disable kube-proxy
, Calico Enterprise needs to communicate to the API server directly rather than
going through kube-proxy
. To make that possible, we need to find a persistent, static way to reach the API server.
The best way to do that varies by Kubernetes distribution:
-
If you created a cluster manually (for example by using
kubeadm
) then the right address to use depends on whether you opted for a high-availability cluster with multiple API servers or a simple one-node API server.- If you opted to set up a high availability cluster then you should use the address of the load balancer that you used in front of your API servers. As noted in the Kubernetes documentation, a load balancer is required for a HA set-up but the precise type of load balancer is not specified.
- If you opted for a single control plane node then you can use the address of the control plane node itself. However, it's important that you use a stable address for that node such as a dedicated DNS record, or a static IP address. If you use a dynamic IP address (such as an EC2 private IP) then the address may change when the node is restarted causing Calico Enterprise to lose connectivity to the API server.
-
kops
typically sets up a load balancer of some sort in front of the API server. You should use the FQDN and port of the API load balancer, for exampleapi.internal.<clustername>
as theKUBERNETES_SERVICE_HOST
below and 443 as theKUBERNETES_SERVICE_PORT
. -
OpenShift requires various DNS records to be created for the cluster; one of these is exactly what we need:
api-int.<cluster_name>.<base_domain>
should point to the API server or to the load balancer in front of the API server. Use that (filling in the<cluster_name>
and<base_domain>
as appropriate for your cluster) for theKUBERNETES_SERVICE_HOST
below. Openshift uses 6443 for theKUBERNETES_SERVICE_PORT
. -
MKE runs a reverse proxy in each node that can be used to reach the API server. You should use
proxy.local
as theKUBERNETES_SERVICE_HOST
and6444
as theKUBERNETES_SERVICE_PORT
. -
For AKS and EKS clusters you should use the FQDN of the API server's load balancer. This can be found with
kubectl cluster-info
which gives output like the following:
Kubernetes master is running at https://60F939227672BC3D5A1B3EC9744B2B21.gr7.us-west-2.eks.amazonaws.com
...In this example, you would use
60F939227672BC3D5A1B3EC9744B2B21.gr7.us-west-2.eks.amazonaws.com
forKUBERNETES_SERVICE_HOST
and443
forKUBERNETES_SERVICE_PORT
when creating the config map.
Once you've found the correct address for your API server, create the following config map in the tigera-operator
namespace using the host and port that you found above:
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: tigera-operator
data:
KUBERNETES_SERVICE_HOST: '<API server host>'
KUBERNETES_SERVICE_PORT: '<API server port>'
The operator will pick up the change to the config map automatically and do a rolling update of Calico Enterprise to pass on the change. Confirm that pods restart and then reach the Running
state with the following command:
watch kubectl get pods -n calico-system
If you do not see the pods restart then it's possible that the ConfigMap
wasn't picked up (sometimes Kubernetes is slow to propagate ConfigMap
s (see Kubernetes issue #30189)). You can try restarting the operator.
Configure kube-proxy
In eBPF mode Calico Enterprise replaces kube-proxy
so it wastes resources (and reduces performance) to run both.
This section explains how to disable kube-proxy
in some common environments.
Clusters that run kube-proxy
with a DaemonSet
(such as kubeadm
)
For a cluster that runs kube-proxy
in a DaemonSet
(such as a kubeadm
-created cluster), you can disable kube-proxy
reversibly by adding a node selector to kube-proxy
's DaemonSet
that matches no nodes, for example:
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
Then, should you want to start kube-proxy
again, you can simply remove the node selector.
This approach is not suitable for AKS with Azure CNI since that platform makes use of the Kubernetes add-on manager. the change will be reverted by the system. For AKS, you should follow Avoiding conflicts with kube-proxy below.
OpenShift
If you are running OpenShift, you can disable kube-proxy
as follows:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": false}}'
To re-enable it:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": true}}'
If you are running kube-proxy in IPVS mode, switch to iptables mode before disabling.
MKE
If you are running MKE, you can disable kube-proxy
as follows:
Follow the step procedure in Modify an existing MKE configuration to download, edit, and upload your MKE configuration. During the editing step, add the following configuration:
kube_proxy_mode=disabled
and kube_default_drop_masq_bits=true
.
Avoiding conflicts with kube-proxy
If you cannot disable kube-proxy
(for example, because it is managed by your Kubernetes distribution), then you must change Felix configuration parameter BPFKubeProxyIptablesCleanupEnabled
to false
. This can be done with kubectl
as follows:
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptablesCleanupEnabled": false}}'
If both kube-proxy
and BPFKubeProxyIptablesCleanupEnabled
is enabled then kube-proxy
will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.
Verify node interface naming pattern
When Calico dataplane is configured in BPF mode, Calico configures ebpf
programs for the host interfaces that match the regex pattern defined by the bpfDataIfacePattern
setting in FelixConfiguration. The default regex value tries to match commonly used interface names, but interface names can vary depending on a virtualization solution, a flavor of the operating system, company-specific configuration standards, such as VLAN device naming pattern, etc. The regex command should at least match interfaces that participate in intra-cluster and external (e.g. NodePorts) communications. In scenarios when a node has additional interfaces, you may want to leverage Calico policies to secure some of them or even all of them or speed up forwarding to/from pods that use them. In such cases, the regex command should match all interfaces that you want to be managed by Calico.
A common example is when a cluster is configured in an on-prem environment and control-plane nodes are virtualized with only one network interface, but the worker nodes are bare-metal nodes with additional interfaces that could be VLAN devices with sub-interfaces and specific naming patterns. In such cases, the bpfDataIfacePattern
setting may need to be adjusted to include the interface from the control-plane nodes as well as necessary interface from the worker nodes.
Enable eBPF mode
To enable eBPF mode, change the spec.calicoNetwork.linuxDataplane
parameter in the operator's Installation
resource to "BPF"
.
kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"BPF"}}}'
When enabling eBPF mode, preexisting connections continue to use the non-BPF datapath; such connections should not be disrupted, but they do not benefit from eBPF mode’s advantages.
The operator rolls out the change with a rolling update (non-disruptive) and then swiftly transitions all nodes to eBPF mode. However, it's inevitable that some nodes will enter eBPF mode before others. This can disrupt the flow of traffic through node ports.
Try out DSR mode
Direct return mode skips a hop through the network for traffic to services (such as node ports) from outside the cluster. This reduces latency and CPU overhead but it requires the underlying network to allow nodes to send traffic with each other's IPs. In AWS, this requires all your nodes to be in the same subnet and for the source/dest check to be disabled.
DSR mode is disabled by default; to enable it, set the BPFExternalServiceMode
Felix configuration parameter to "DSR"
. This can be done with kubectl
:
kubectl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
To switch back to tunneled mode, set the configuration parameter to "Tunnel"
:
kubectl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "Tunnel"}}'
Switching external traffic mode can disrupt in-progress connections.
Reversing the process
To revert to standard Linux networking:
-
Reverse the changes to the operator's
Installation
:kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"Iptables"}}}'
-
If you disabled
kube-proxy
, re-enable it (for example, by removing the node selector added above).kubectl patch ds -n kube-system kube-proxy --type merge -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": null}}}}}'
-
If you are running MKE, follow the step procedure in Modify an existing MKE configuration to download, edit, and upload your MKE configuration. During the editing step, add the following configuration:
kube_proxy_mode
toiptables
. -
Since disabling eBPF mode is disruptive to existing connections, monitor existing workloads to make sure they re-establish any connections that were disrupted by the switch.