Enabling the eBPF data plane
This page shows you how to enable the eBPF data plane on an existing cluster.
For self-managed clusters that use kubeadm or similar tools to create the cluster, you can configure the Tigera Operator to enable eBPF automatically.
For all other clusters, there is a manual configuration process to enable eBPF.
Enable the eBPF data plane automatically for self-managed clustersβ
You can quickly enable the eBPF data plane by configuring Tigera Operator.
This is the recommended approach for most self-managed clusters that use kubeadm or similar tools to create the cluster.
Prerequisites
- Your cluster was created using
kubeadmorkubeadm-based tools. - Calico Open Source was installed on your cluster using the Tigera Operator.
- Your cluster has
kube-proxyrunning in thekube-systemnamespace. kube-proxyis not managed by an automated tool, such as Helm or ArgoCD.- The Tigera Operator can access
kubernetesservice and endpoints.
Procedure
-
To enable eBPF mode with the automatic configuration, set the
spec.calicoNetwork.bpfNetworkBootstrapandspec.calicoNetwork.kubeProxyManagementparameters in the operator'sInstallationresource to "Enabled", and thespec.calicoNetwork.linuxDataplaneparameter to "BPF".kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"BPF", "bpfNetworkBootstrap":"Enabled", "kubeProxyManagement":"Enabled"}}}'noteThe operator rolls out the change with a rolling update (non-disruptive) and then swiftly transitions all nodes t o eBPF mode. However, it's inevitable that some nodes will enter eBPF mode before others. This can disrupt the flow of traffic through node ports.
After this change, the operator will configure the API Server addresses and disable
kube-proxy.
Next stepsβ
- For best performance, configure your cluster to use direct server return mode.
Enable the eBPF data plane manually on any clusterβ
This section explains how to enable the eBPF data plane on all compatible clusters.
Before you beginβ
Supportedβ
- x86-64
- arm64 (little endian)
- Kubernetes datastore driver.
- Distributions:
- Generic or kubeadm
- kOps
- OpenShift
- EKS
- AKS with limitations:
- AKS with Azure CNI and Calico network policy works, but it is not possible to disable kube-proxy resulting in wasted resources and suboptimal performance.
- AKS with Calico networking is in testing with the eBPF data plane. This should be a better solution overall but, at time of writing, the testing was not complete.
- RKE (RKE2 recommended because it supports disabling
kube-proxy) - MKE
- Linux distribution/kernel:
- Ubuntu 20.04.
- Red Hat v8.2 with Linux kernel v4.18.0-193 or above (Red Hat have backported the required features to that build).
- Another supported distribution with Linux kernel v5.3 or above. Kernel v5.8 or above with CO-RE enabled is recommended for better performance.
- An underlying network fabric that allows VXLAN traffic between hosts. In eBPF mode, VXLAN is used to forward Kubernetes NodePort traffic.
- IPv6
Limitations:
- IPIP is not supported (Calico iptables does not support it either). VXLAN is the recommended overlay for eBPF mode.
To enable IPv6 in eBPF mode, see Configure dual stack or IPv6 only. You may be able to run with non-Calico IPAM. eks-cni is known to work.
Not supportedβ
- Other processor architectures.
- etcd datastore driver. The etcd datastore driver doesn't support watching Kubernetes services, which is required for some features in eBPF mode.
- Distributions:
- GKE. This is because of an incompatibility with the GKE CNI plugin.
- Clusters with some eBPF nodes and some standard data plane and/or Windows nodes.
- Floating IPs.
- SCTP (either for policy or services). This is due to lack of kernel support for the SCTP checksum in BPF.
- VLAN-based traffic without hardware offloading.
tip
You can use a VLAN device to connect a node to the cluster by using the
bpfDataIfacePatternFelix configuration variable if the underlying/physical device supports VLAN offloading. For more information, see Debugging Connectivity in Calico eBPF.
Verify that your cluster is ready for eBPF modeβ
This section explains how to make sure your cluster is suitable for eBPF mode.
To check that the kernel on a node is suitable, you can run
uname -rv
The output should look like this:
5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
In this case the kernel version is v5.4, which is suitable.
On Red Hat-derived distributions, you may see something like this:
4.18.0-193.el8.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com)
Since the Red Hat kernel is v4.18 with at least build number 193, this kernel is suitable.
Performanceβ
For best pod-to-pod performance, we recommend using an underlying network that doesn't require Calico to use an overlay. For example:
- A cluster within a single AWS subnet.
- A cluster using a compatible cloud provider's CNI (such as the AWS VPC CNI plugin).
- An on-prem cluster with BGP peering configured.
If you must use an overlay, we recommend that you use VXLAN, not IPIP. VXLAN has much better performance than IPIP in eBPF mode due to various kernel optimisations.
Configure Calico to talk directly to the API serverβ
In eBPF mode, Calico implements Kubernetes service networking directly (rather than relying on kube-proxy).
Of course, this makes it highly desirable to disable kube-proxy when running in eBPF mode to save resources
and avoid confusion over which component is handling services.
To be able to disable kube-proxy, Calico needs to communicate to the API server directly rather than
going through kube-proxy. To make that possible, we need to find a persistent, static way to reach the API server.
The best way to do that varies by Kubernetes distribution:
-
If you created a cluster manually (for example by using
kubeadm) then the right address to use depends on whether you opted for a high-availability cluster with multiple API servers or a simple one-node API server.- If you opted to set up a high availability cluster then you should use the address of the load balancer that you used in front of your API servers. As noted in the Kubernetes documentation, a load balancer is required for a HA set-up but the precise type of load balancer is not specified.
- If you opted for a single control plane node then you can use the address of the control plane node itself. However, it's important that you use a stable address for that node such as a dedicated DNS record, or a static IP address. If you use a dynamic IP address (such as an EC2 private IP) then the address may change when the node is restarted causing Calico to lose connectivity to the API server.
-
kopstypically sets up a load balancer of some sort in front of the API server. You should use the FQDN and port of the API load balancer, for exampleapi.internal.<clustername>as theKUBERNETES_SERVICE_HOSTbelow and 443 as theKUBERNETES_SERVICE_PORT. -
OpenShift requires various DNS records to be created for the cluster; one of these is exactly what we need:
api-int.<cluster_name>.<base_domain>should point to the API server or to the load balancer in front of the API server. Use that (filling in the<cluster_name>and<base_domain>as appropriate for your cluster) for theKUBERNETES_SERVICE_HOSTbelow. Openshift uses 6443 for theKUBERNETES_SERVICE_PORT. -
MKE runs a reverse proxy in each node that can be used to reach the API server. You should use
proxy.localas theKUBERNETES_SERVICE_HOSTand6444as theKUBERNETES_SERVICE_PORT. -
For AKS and EKS clusters you should use the FQDN of the API server's load balancer. This can be found with
kubectl cluster-infowhich gives output like the following:
Kubernetes master is running at https://60F939227672BC3D5A1B3EC9744B2B21.gr7.us-west-2.eks.amazonaws.com
...In this example, you would use
60F939227672BC3D5A1B3EC9744B2B21.gr7.us-west-2.eks.amazonaws.comforKUBERNETES_SERVICE_HOSTand443forKUBERNETES_SERVICE_PORTwhen creating the config map.
The next step depends on whether you installed Calico using the operator, or a manifest:
- Operator
- Manifest
If you installed Calico using the operator, create the following config map in the tigera-operator namespace using the host and port determined above:
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: tigera-operator
data:
KUBERNETES_SERVICE_HOST: '<API server host>'
KUBERNETES_SERVICE_PORT: '<API server port>'
The operator will pick up the change to the config map automatically and do a rolling update of Calico to pass on the change. Confirm that pods restart and then reach the Running state with the following command:
watch kubectl get pods -n calico-system
If you do not see the pods restart then it's possible that the ConfigMap wasn't picked up (sometimes Kubernetes is slow to propagate ConfigMaps (see Kubernetes issue #30189)). You can try restarting the operator.
If you installed Calico using a manifest, create the following config map in the kube-system namespace using the host and port determined above:
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: kube-system
data:
KUBERNETES_SERVICE_HOST: '<API server host>'
KUBERNETES_SERVICE_PORT: '<API server port>'
Wait 60s for kubelet to pick up the ConfigMap (see Kubernetes issue #30189); then, restart the Calico pods to pick up the change:
kubectl delete pod -n kube-system -l k8s-app=calico-node
kubectl delete pod -n kube-system -l k8s-app=calico-kube-controllers
And, if using Typha:
kubectl delete pod -n kube-system -l k8s-app=calico-typha
Confirm that pods restart and then reach the Running state with the following command:
watch "kubectl get pods -n kube-system | grep calico"
You can verify that the change was picked up by checking the logs of one of the calico/node pods.
kubectl get po -n kube-system -l k8s-app=calico-node
Should show one or more pods:
NAME READY STATUS RESTARTS AGE
calico-node-d6znw 1/1 Running 0 48m
...
Then, to search the logs, choose a pod and run:
kubectl logs -n kube-system <pod name> | grep KUBERNETES_SERVICE_HOST
You should see the following log, with the correct KUBERNETES_SERVICE_... values.
2020-08-26 12:26:29.025 [INFO][7] daemon.go 182: Kubernetes server override env vars. KUBERNETES_SERVICE_HOST="172.16.101.157" KUBERNETES_SERVICE_PORT="6443"
Configure kube-proxyβ
In eBPF mode Calico replaces kube-proxy so it wastes resources (and reduces performance) to run both.
This section explains how to disable kube-proxy in some common environments.
Clusters that run kube-proxy with a DaemonSet (such as kubeadm)β
For a cluster that runs kube-proxy in a DaemonSet (such as a kubeadm-created cluster), you can disable kube-proxy reversibly by adding a node selector to kube-proxy's DaemonSet that matches no nodes, for example:
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
Then, should you want to start kube-proxy again, you can simply remove the node selector.
This approach is not suitable for AKS with Azure CNI since that platform makes use of the Kubernetes add-on manager. the change will be reverted by the system. For AKS, you should follow Avoiding conflicts with kube-proxy below.
OpenShiftβ
If you are running OpenShift, you can disable kube-proxy as follows:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": false}}'
To re-enable it:
kubectl patch networks.operator.openshift.io cluster --type merge -p '{"spec":{"deployKubeProxy": true}}'
MKEβ
If you are running MKE, you can disable kube-proxy as follows:
Follow the step procedure in Modify an existing MKE configuration to download, edit, and upload your MKE configuration. During the editing step, add the following configuration:
kube_proxy_mode=disabled and kube_default_drop_masq_bits=true.
If you are running kube-proxy in IPVS mode, switch to iptables mode before disabling.
Avoiding conflicts with kube-proxyβ
If you cannot disable kube-proxy (for example, because it is managed by your Kubernetes distribution), then you must change Felix configuration parameter BPFKubeProxyIptablesCleanupEnabled to false. This can be done with kubectl as follows:
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptablesCleanupEnabled": false}}'
If both kube-proxy and BPFKubeProxyIptablesCleanupEnabled is enabled then kube-proxy will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.
Enable eBPF modeβ
The next step depends on whether you installed Calico using the operator, or a manifest:
- Operator
- Manifest
To enable eBPF mode, change the spec.calicoNetwork.linuxDataplane parameter in the operator's Installation
resource to "BPF".
kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"BPF"}}}'
The operator rolls out the change with a rolling update (non-disruptive) and then swiftly transitions all nodes to eBPF mode. However, it's inevitable that some nodes will enter eBPF mode before others. This can disrupt the flow of traffic through node ports.
If you installed Calico using a manifest, change Felix configuration parameter BPFEnabled to true. This can be done with calicoctl, as follows:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": true}}'
When enabling eBPF mode, preexisting connections continue to use the non-BPF datapath; such connections should not be disrupted, but they do not benefit from eBPF modeβs advantages.
Next stepsβ
- For best performance, configure your cluster to use direct server return mode.
Enable direct server return modeβ
Direct server return (DSR) mode skips a hop through the network for traffic to services (such as node ports) from outside the cluster. This reduces latency and CPU overhead but it requires the underlying network to allow nodes to send traffic with each other's IPs. In AWS, this requires all your nodes to be in the same subnet and for the source/dest check to be disabled.
DSR mode is disabled by default; to enable it, set the BPFExternalServiceMode Felix configuration parameter to "DSR". This can be done with calicoctl:
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
To switch back to tunneled mode, set the configuration parameter to "Tunnel":
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "Tunnel"}}'
Switching external traffic mode can disrupt in-progress connections.
Reversing the processβ
To revert to standard Linux networking:
-
(Depending on whether you installed Calico with the operator or with a manifest) reverse the changes to the operator's
Installationor theFelixConfigurationresource:- Operator
- Manifest
kubectl patch installation.operator.tigera.io default --type merge -p '{"spec":{"calicoNetwork":{"linuxDataplane":"Iptables"}}}'calicoctl patch felixconfiguration default --patch='{"spec": {"bpfEnabled": false}}' -
If you disabled
kube-proxymanually, re-enable it (for example, by removing the node selector added above). If you chose the Auto config, Operator will revert the changes and re-deploykube-proxy.kubectl patch ds -n kube-system kube-proxy --type merge -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": null}}}}}' -
If you are running MKE, follow the step procedure in Modify an existing MKE configuration to download, edit, and upload your MKE configuration. During the editing step, add the following configuration:
kube_proxy_modetoiptables. -
Since disabling eBPF mode is disruptive to existing connections, monitor existing workloads to make sure they re-establish any connections that were disrupted by the switch.