Calico Cloud documentation

Troubleshoot eBPF mode

This document gives general troubleshooting guidance for the eBPF data plane.

To understand basic concepts, we recommend the following video by Tigera Engineers: Opening the Black Box: Understanding and troubleshooting Calico's eBPF Data Plane.

Troubleshoot access to services

Verify that eBPF mode is correctly enabled

Examine the log for a calico-node container; in the extremely rare case when eBPF mode is not supported it will log an ERROR log that says
```
BPF data plane mode enabled but not supported by the kernel. Disabling BPF mode.
```
If BPF mode is correctly enabled, you should see an INFO log that says
```
BPF enabled, starting BPF endpoint manager and map manager.
```
In eBPF mode, forwarding external client access to services (typically NodePorts) from node to node is implemented using VXLAN encapsulation. If NodePorts time out when the backing pod is on another node, check your underlying network fabric allows VXLAN traffic between the nodes. VXLAN is a UDP protocol; by default it uses port 4789.

Note that this VXLAN traffic is separate from any overlay network that you may be using for pod-to-pod traffic.
In DSR mode, Calico Cloud requires that the underlying network fabric allows one node to respond on behalf of another.
- In AWS, to allow this, the Source/Dest check must be disabled on the node's NIC. However, note that DSR only works within AWS; it is not compatible with external traffic through a load balancer. This is because the load balancer is expecting the traffic to return from the same host.
- In GCP, the "Allow forwarding" option must be enabled. As with AWS, traffic through a load balancer does not work correctly with DSR because the load balancer is not consulted on the return path from the backing node.

The `calico-node -bpf` tool

To inspect Calico Cloud's internal data structures, you can use the calico-node -bpf tool. The tool is embedded in the cnx-node container image and displays information about the eBPF data plane from within a calico-node pod only. Use kubectl get pod -o wide -n calico-system to find the name of a calico-node pod and use the name in the following commands instead of <calico-node-name>.

To run the tool, use:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf <args>

For example, to show the tool's help:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf help

  Available Commands:
    arp          Manipulates arp
    cleanup      Removes all calico-bpf programs and maps
    completion   Generate the autocompletion script for the specified shell
    connect-time Manipulates connect-time load balancing programs
    conntrack    Manipulates connection tracking
    counters     Show and reset counters
    help         Help about any command
    ifstate      Manipulates ifstate
    ipsets       Manipulates ipsets
    nat          Manipulates network address translation (nat)
    policy       Dump policy attached to interface
    profiling    Show and reset profiling data
    routes       Manipulates routes
    version      Prints the version and exits

(Since the tool is embedded in the main `calico-node` binary the `--help` option is not available, but running
`calico-node -bpf help` does work.)

For example, to dump the BPF conntrack table, use:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf conntrack dump
...

Debug access to services

Inspect the BPF NAT table to verify that the service is correctly programmed. To dump the BPF NAT table:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf nat dump

96.0.10 port 53 proto 6 id 4 count 2 local 0
0	 192.168.129.66:53
1	 192.168.129.68:53
96.0.10 port 53 proto 17 id 6 count 2 local 0
0	 192.168.129.66:53
1	 192.168.129.68:53
96.0.10 port 9153 proto 6 id 5 count 2 local 0
0	 192.168.129.66:9153
1	 192.168.129.68:9153
105.77.92 port 5473 proto 6 id 0 count 2 local 0
0	 10.128.1.192:5473
1	 10.128.1.195:5473
105.187.231 port 8081 proto 6 id 2 count 1 local 0
0	 192.168.105.131:8081
109.136.88 port 7443 proto 6 id 1 count 1 local 0
0	 192.168.129.72:7443
109.139.39 port 443 proto 6 id 7 count 2 local 0
0	 192.168.129.67:5443
1	 192.168.129.69:5443
96.0.1 port 443 proto 6 id 3 count 1 local 0
0	 10.128.0.255:6443

Inspect the BPF conntrack table to verify that connections are being tracked. To dump the BPF conntrack table:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf conntrack dump

TCP 192.168.58.7:49178 -> 10.111.57.87:80 -> 192.168.105.136:80  Active ago 4.486606371s CLOSED              <--- example of connection to service with per-packet NAT
TCP 10.128.1.194:41513 -> 10.128.1.192:179  Active ago 26.442759238s ESTABLISHED                             <--- example of connection without NAT or with connect-time NAT
TCP 192.168.58.7:42818 -> 10.111.57.87:80 -> 192.168.105.136:80  Active ago 1m15.987585857s CLOSED
TCP 10.128.1.192:58208 -> 10.109.136.88:7443 -> 192.168.58.5:7443  Active ago 4.935017508s ESTABLISHED
UDP 162.142.125.240:30603 -> 10.128.1.192:18989  Active ago 1m0.816678617s
UDP 127.0.0.1:48611 -> 127.0.0.53:53  Active ago 17.789851961s

Note that traffic originating within the cluster uses connect-time load balancing. By default, the connect-time load balancing is only enabled for TCP traffic. When connect-time load balancing is used, the conntrack table will not show the NAT resolution as that happens when the application calls connect().

Check if Calico Cloud is dropping packets

If you suspect that Calico Cloud is dropping packets, you can use the calico-node -bpf tool to check the BPF counters. Since the eBPF data plane is split into programs that are attached to interfaces, you must check the counters on the relevant interface. You can either dump counters for all interfaces or use --iface=<interface name> to dump counters for a specific interface.

Increasing counter Dropped by policy indicates that Calico Cloud is dropping packets due to policy and you should check your policy configuration.

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf counters dump --iface=eth0
+----------+--------------------------------+---------+--------+-----+
| CATEGORY |              TYPE              | INGRESS | EGRESS | XDP |
+----------+--------------------------------+---------+--------+-----+
| Accepted | by another program             |       0 |      0 | N/A |
|          | by failsafe                    |       0 |      0 | N/A |
|          | by policy                      |       0 |      4 | N/A |
| Dropped  | NAT source collision           |       0 |      0 | N/A |
|          | resolution failed              |         |        |     |
|          | QoS control limit              |       0 |      0 | N/A |
|          | by policy                      |       0 |     11 | N/A |
|          | failed decapsulation           |       0 |      0 | N/A |
|          | failed encapsulation           |       0 |      0 | N/A |
|          | failed to create conntrack     |       0 |      0 | N/A |
|          | fragment of yet incomplete     |       0 |      0 | N/A |
|          | packet                         |         |        |     |
|          | fragment out of order within   |       0 |      0 | N/A |
|          | host                           |         |        |     |
|          | fragments not supported        |       0 |      0 | N/A |
|          | incorrect checksum             |       0 |      0 | N/A |
|          | malformed IP packets           |       0 |      0 | N/A |
|          | packets hitting blackhole      |       0 |      0 | N/A |
|          | route                          |         |        |     |
|          | packets with unknown route     |       0 |      0 | N/A |
|          | packets with unknown source    |       0 |      0 | N/A |
|          | packets with unsupported IP    |       0 |      0 | N/A |
|          | options                        |         |        |     |
|          | too short packets              |       0 |      0 | N/A |
| Other    | packets hitting NAT source     |       0 |      0 | N/A |
|          | collision                      |         |        |     |
| Redirect | neigh                          |       0 |      0 | N/A |
|          | peer                           |       0 |      0 | N/A |
|          | plain                          |      20 |      0 | N/A |
| Total    | packets                        |      34 |     22 | N/A |
+----------+--------------------------------+---------+--------+-----+

eBPF program debug logs

Sometimes it is necessary to examine the logs that are emitted by the eBPF programs themselves. Although the logs can be very verbose (because the programs will log every packet), they can be invaluable to diagnose eBPF program issues. To enable the log, set the bpfLogLevel Felix configuration setting to Debug.

caution

Enabling logs in this way has a significant impact on eBPF program performance.

To reduce the performance impact in production clusters, you can target logging to specific traffic and/or specific interfaces using the bpfLogFilters Felix configuration setting. Filters are pcap expressions.

Note that the filters are applied to the original packet, before any NAT or encapsulation. Therefore, to log a packet that is being sent to a service making its way via different devices, you must filter on the service IP and port and also the backend pod IP and port.

The logs are emitted to the kernel trace buffer, and they can be examined using the following command:

kubectl exec -n calico-system <calico-node-name> -- bpftool prog tracelog

Logs have the following format:

     <...>-84582 [000] .Ns1  6851.690474: 0: ens192---E: Final result=ALLOW (-1). Program execution time: 7366ns

The parts of the log are explained below:

<...>-84582 gives an indication about what program (or kernel process) was handling the packet. For packets that are being sent, this is usually the name and PID of the program that is actually sending the packet. For packets that are received, it is typically a kernel process, or an unrelated program that happens to trigger the processing.
6851.690474 is the log timestamp.
ens192---E is the Calico Cloud log tag. For programs attached to interfaces, the first part contains the first few characters of the interface name. The suffix is either -I or -E indicating "Ingress" or "Egress". "Ingress" and "Egress" have the same meaning as for policy:
- A workload ingress program is executed on the path from the host network namespace to the workload.
- A workload egress program is executed on the workload to host path.
- A host endpoint ingress program is executed on the path from external node to the host.
- A host endpoint egress program is executed on the path from host to external host.
you may also see ens192---X which indicates an XDP program. Calico Cloud uses XDP programs to implement doNotTrack policies on host devices only.
Final result=ALLOW (-1). Program execution time: 7366ns is the message. In this case, logging the final result of the program. Note that the timestamp is massively distorted by the time spent logging.

Debugging policy issues

If you suspect that Calico Cloud is dropping packets due to policy, you can use the calico-node -bpf tool to dump the policy that is attached to a specific interface.

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf policy dump <interface> <type> [--asm]

Where:

<interface> is the name of the interface, for example eth0 or caliXXXXXX.
<type> represents location of the policy, either ingress, egress, xdp or all.

Dump of an ingress policy. Note that ingress policy for a pod is attached to the tc/tcx egress hook of the host-side of the caliX veth pair, while ingress policy for host endpoints is attached to the tc/tcx ingress hook of the host interface. Similarly for egress policy.

IfaceName: calic31b4f7fc58
Hook: tc egress
Error:
Policy Info:
// Start of tier default
// Start of policy default/knp.default.allow-nginx-from-ubuntu
// Start of rule action:"allow"  protocol:{name:"tcp"}  dst_ports:{first:80  last:80}  src_ip_set_ids:"s:nzE8vwTu69FSscx2FDKjb20D9dZxEyVxsWFqwA"  original_src_selector:"projectcalico.org/orchestrator == 'k8s' && app == 'ubuntu-client'"  rule_id:"29vMYcPWr7reSxxN"
// IPSets src_ip_set_ids:<0x303904ae5eae5418>
// count = 9
// End of rule 29vMYcPWr7reSxxN
// End of policy default/knp.default.allow-nginx-from-ubuntu
// End of tier default: deny
// Start of rule action:"allow"  rule_id:"aBMQCbsUMESPKGRp"
// count = 0
// End of rule aBMQCbsUMESPKGRp

Rules that use selectors refer to IP sets. You can dump the contents of an IP set using the ipsets command and you can check whether the IP set contains the expected members:

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf ipsets dump

You can see how many packets have matched each rule. In this example, 9 packets have matched rule 29vMYcPWr7reSxxN and 0 packets have matched rule aBMQCbsUMESPKGRp.

Adding --asm will show the eBPF assembly code for the program as well.

kubectl exec -n calico-system <calico-node-name> -- calico-node -bpf ipsets dump

IP set 0x303904ae5eae5418
   192.168.58.5/32

IP set 0xffef9f925a8a4ca4
   192.168.129.66/32
   192.168.129.68/32

Debugging calico-node not ready

If you notice that a calico-node pod is not ready, check its logs for errors. The most likely reason for calico-node not being ready in eBPF mode is that Calico Cloud is not able to update a program attached to an interface. Look for the following type of warning:

2025-09-22 22:39:59.801 [WARNING][10374] felix/bpf_ep_mgr.go 2107: Failed to apply policy to endpoint, leaving it dirty

One reason for this type of error is that the eBPF programs provided in the cnx-node image are not compatible with the verifier used by your kernel. Each kernel must ensure that the eBPF programs it loads are safe to run. However, capabilities of the verifier differ between kernel versions. We do test the eBPF programs with a range of kernels, but it is impossible to test all kernels. You may see errors such as if the verifier rejects a program:

265: (79) r1 = *(u64 *)(r10 -72)      ; R1_w=ctx(off=0,imm=0) R10=fp0
266: (79) r2 = *(u64 *)(r10 -128)     ; R2_w=scalar(umin=14,umax=74,var_off=(0x2; 0x7c)) R10=fp0
267: (b7) r5 = 0                      ; R5_w=0
268: (85) call bpf_skb_store_bytes#9
invalid access to map value, value_size=1512 off=8 size=0
R3 min value is outside of the allowed memory range
processed 1102 insns (limit 1000000) max_states_per_insn 2 total_states 38 peak_states 38 mark_read 27
-- END PROG LOAD LOG --
libbpf: prog 'calico_tc_skb_ipv4_frag': failed to load: -13
libbpf: failed to load object '/usr/lib/calico/bpf/to_wep_no_log.o'
2025-07-31 17:36:15.708 [WARNING][45] felix/bpf_ep_mgr.go 2124: Failed to apply policy to endpoint, leaving it dirty error=attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied
attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied name="enif327a56b833" wepID=&types.WorkloadEndpointID{OrchestratorId:"k8s", WorkloadId:"calico-system/csi-node-driver-tggmr", EndpointId:"eth0"}
2025-07-31 17:36:15.708 [WARNING][45] felix/bpf_ep_mgr.go 2124: Failed to apply policy to endpoint, leaving it dirty error=attaching program to wep: loading generic v4 tc hook program: error loading program: error loading object permission denied

If you see errors of this type, please open an issue on the Calico Cloud GitHub repository, including details of your kernel version and distribution.

Poor performance

A number of problems can reduce the performance of the eBPF data plane.

Verify that you are using the best networking mode for your cluster. If possible, avoid using an overlay network; a routed network with no overlay is considerably faster. If you must use one of Calico Cloud's overlay modes, use VXLAN, not IPIP. IPIP performs poorly in eBPF mode due to kernel limitations.

If you are not using an overlay, verify that the Felix configuration parameters ipInIpEnabled and vxlanEnabled are set to false. Those parameters control whether Felix configured itself to allow IPIP or VXLAN, even if you have no IP pools that use an overlay. The parameters also disable certain eBPF mode optimisations for compatibility with IPIP and VXLAN.

To examine the configuration:

kubectl get felixconfiguration -o yaml

apiVersion: projectcalico.org/v3
items:
- apiVersion: projectcalico.org/v3
  kind: FelixConfiguration
  metadata:
    creationTimestamp: "2020-10-05T13:41:20Z"
    name: default
    resourceVersion: "767873"
    uid: 8df8d751-7449-4b19-a4f9-e33a3d6ccbc0
  spec:
    ...
    ipipEnabled: false
    ...
    vxlanEnabled: false
kind: FelixConfigurationList
metadata:
  resourceVersion: "803999"

If you are running your cluster in a cloud such as AWS, then your cloud provider may limit the bandwidth between nodes in your cluster. For example, most AWS nodes are limited to 5GBit per connection.

Runtime profiling

Setting bpfProfiling to Enabled enables collection of runtime profiling data for eBPF programs. It collects the average execution time and number of executions for each eBPF program attached to each interface. The profiling data can be examined using the calico-node -bpf profiling e2e command. The command resets the profiling data after dumping it.

----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+
|     IFACE      | INGRESS NEW |  #  | INGRESS EST |   #   | EGRESS NEW  |   #   | EGRESS ETS  |   #   |
+----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+
| lo             | ---         | --- | ---         | ---   | 142.263 ns  | 10272 | ---         | ---   |
| eth0           | 2492.344 ns |  32 | 1535.443 ns | 16114 | 6296.421 ns |   749 | 1503.339 ns | 10982 |
| eni76136be4c77 | 5031.436 ns | 149 | 1194.923 ns |  1421 | 4950.196 ns |   138 | 1437.015 ns |  1432 |
| eni80d5c04bc95 | 7773.459 ns |  74 | 1508.973 ns |   641 | 4907.333 ns |    69 | 1715.848 ns |   646 |
| eth1           | 136.250 ns  |  24 | ---         | ---   | 75.320 ns   |    25 | ---         | ---   |
| eni5f8ab1cfc29 | 107.250 ns  |  36 | 1068.596 ns |  1514 | 189.528 ns  |    36 | 1104.335 ns |  1658 |
| bpfout.cali    | 440.000 ns  |   1 | ---         | ---   | 206.000 ns  |     1 | ---         | ---   |
+----------------+-------------+-----+-------------+-------+-------------+-------+-------------+-------+

Debug high CPU usage

If you notice calico-node using high CPU:

Check if kube-proxy is still running. If kube-proxy is still running, you must either disable kube-proxy or ensure that the Felix configuration setting bpfKubeProxyIptablesCleanupEnabled is set to false. If the setting is set to true (its default), then Felix will attempt to remove kube-proxy's iptables rules. If kube-proxy is still running, Felix will continually try to remove the rules, which can cause high CPU usage.
If your cluster is very large, or your workload involves significant service churn, you can increase the interval at which Felix updates the services data plane by increasing the bpfKubeProxyMinSyncPeriod setting. The default is 1 second. Increasing the value has the trade-off that service updates will happen more slowly.
Calico Cloud supports endpoint slices, similarly to kube-proxy. If your Kubernetes cluster supports endpoint slices and they are enabled, then you can enable endpoint slice support in Calico Cloud with the bpfKubeProxyEndpointSlicesEnabled configuration flag.

Troubleshoot access to services​

The calico-node -bpf tool

Debug access to services​

Check if Calico Cloud is dropping packets​

eBPF program debug logs​

Debugging policy issues​

Debugging calico-node not ready​

Poor performance​

Runtime profiling​

Debug high CPU usage​