Skip to main content

Prometheus statistics

Felix can be configured to report a number of metrics through Prometheus. See the configuration reference for how to enable metrics reporting.

Metric reference

Felix specific

Felix exports a number of Prometheus metrics. The current set is as follows. Since some metrics are tied to particular implementation choices inside Felix we can't make any hard guarantees that metrics will persist across releases. However, we aim not to make any spurious changes to existing metrics.

felix_active_local_endpointsNumber of active endpoints on this host.
felix_active_local_policiesNumber of active policies on this host.
felix_active_local_selectorsNumber of active selectors on this host.
felix_active_local_tagsNumber of active tags on this host.
felix_bpf_conntrack_cleanedNumber of entries cleaned during a conntrack table sweep.
felix_bpf_conntrack_cleaned_totalTotal number of entries cleaned during conntrack table sweeps, incremented for each clean individualy.
felix_bpf_conntrack_expiredNumber of entries cleaned during a conntrack table sweep due to expiration.
felix_bpf_conntrack_expired_totalTotal number of entries cleaned during conntrack table sweep due to expiration - by reason.
felix_bpf_conntrack_inforeader_blocksConntrack InfoReader would-blocks.
felix_bpf_conntrack_stale_natNumber of entries cleaned during a conntrack table sweep due to stale NAT.
felix_bpf_conntrack_stale_nat_totalTotal number of entries cleaned during conntrack table sweeps due to stale NAT.
felix_bpf_conntrack_sweepsNumber of contrack table sweeps made so far.
felix_bpf_conntrack_usedNumber of used entries visited during a conntrack table sweep.
felix_bpf_conntrack_sweep_durationConntrack sweep execution time (ns).
felix_bpf_num_ip_setsNumber of BPF IP sets managed in the dataplane.
felix_calc_graph_output_eventsNumber of events emitted by the calculation graph.
felix_calc_graph_update_time_secondsSeconds to update calculation graph for each datastore OnUpdate call.
felix_calc_graph_updates_processedNumber of datastore updates processed by the calculation graph.
felix_cluster_num_host_endpointsTotal number of host endpoints cluster-wide.
felix_cluster_num_hostsTotal number of Calico Cloud hosts in the cluster.
felix_cluster_num_policiesTotal number of policies in the cluster.
felix_cluster_num_profilesTotal number of profiles in the cluster.
felix_cluster_num_tiersTotal number of Calico Cloud tiers in the cluster.
felix_cluster_num_workload_endpointsTotal number of workload endpoints cluster-wide.
felix_egress_gateway_remote_polls{status="total"}Total number of remote egress gateway pods that Felix is polling for health/connectivity. Only egress gateways with a named "health" port will be polled.
felix_egress_gateway_remote_polls{status="up"}Total number of remote egress gateway pods that have successful probes.
felix_egress_gateway_remote_polls{status="probe-failed"}Total number of remote egress gateway pods that have failed probes.
felix_exec_time_microsSummary of time taken to fork/exec child processes
felix_int_dataplane_addr_msg_batch_sizeNumber of interface address messages processed in each batch. Higher values indicate we're doing more batching to try to keep up.
felix_int_dataplane_apply_time_secondsTime in seconds that it took to apply a dataplane update.
felix_int_dataplane_failuresNumber of times dataplane updates failed and will be retried.
felix_int_dataplane_iface_msg_batch_sizeNumber of interface state messages processed in each batch. Higher values indicate we're doing more batching to try to keep up.
felix_int_dataplane_messagesNumber dataplane messages by type.
felix_int_dataplane_msg_batch_sizeNumber of messages processed in each batch. Higher values indicate we're doing more batching to try to keep up.
felix_ipsec_bindings_totalTotal number of ipsec bindings.
felix_ipsec_errorsNumber of ipsec command failures.
felix_ipset_callsNumber of ipset commands executed.
felix_ipset_errorsNumber of ipset command failures.
felix_ipset_lines_executedNumber of ipset operations executed.
felix_ipsets_calicoNumber of active Calico Cloud IP sets.
felix_ipsets_totalTotal number of active IP sets.
felix_iptables_chainsNumber of active iptables chains.
felix_iptables_lines_executedNumber of iptables rule updates executed.
felix_iptables_lock_acquire_secsTime taken to acquire the iptables lock.
felix_iptables_lock_retriesNumber of times the iptables lock was already held and felix had to retry to acquire it.
felix_iptables_restore_callsNumber of iptables-restore calls.
felix_iptables_restore_errorsNumber of iptables-restore errors.
felix_iptables_rulesNumber of active iptables rules.
felix_iptables_save_callsNumber of iptables-save calls.
felix_iptables_save_errorsNumber of iptables-save errors.
felix_log_errorsNumber of errors encountered while logging.
felix_logs_droppedNumber of logs dropped because the output stream was blocked.
felix_reporter_log_errorsNumber of errors encountered while logging in the Syslog.
felix_reporter_logs_droppedNumber of logs dropped because the output was blocked in the the Syslog reporter.
felix_resync_stateCurrent datastore state.
felix_resyncs_startedNumber of times Felix has started resyncing with the datastore.
felix_route_table_list_secondsTime taken to list all the interfaces during a resync.
felix_route_table_per_iface_sync_secondsTime taken to sync each interface

Prometheus metrics are self-documenting, with metrics turned on, curl can be used to list the metrics along with their help text and type information.

curl -s http://localhost:9091/metrics | head

Example response:

# HELP felix_active_local_endpoints Number of active endpoints on this host.
# TYPE felix_active_local_endpoints gauge
felix_active_local_endpoints 91
# HELP felix_active_local_policies Number of active policies on this host.
# TYPE felix_active_local_policies gauge
felix_active_local_policies 0
# HELP felix_active_local_selectors Number of active selectors on this host.
# TYPE felix_active_local_selectors gauge
felix_active_local_selectors 82

CPU / memory metrics

Felix also exports the default set of metrics that Prometheus makes available. Currently, those include:

go_gc_duration_secondsA summary of the GC invocation durations.
go_goroutinesNumber of goroutines that currently exist.
go_infoGo version.
go_memstats_alloc_bytesNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalTotal number of frees.
go_memstats_gc_cpu_fractionThe fraction of this program’s available CPU time used by the GC since the program started.
go_memstats_gc_sys_bytesNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesNumber of heap bytes that are in use.
go_memstats_heap_objectsNumber of allocated objects.
go_memstats_heap_released_bytesNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalTotal number of pointer lookups.
go_memstats_mallocs_totalTotal number of mallocs.
go_memstats_mcache_inuse_bytesNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesNumber of bytes obtained by system. Sum of all system allocations.
go_threadsNumber of OS threads created.
process_cpu_seconds_totalTotal user and system CPU time spent in seconds.
process_max_fdsMaximum number of open file descriptors.
process_open_fdsNumber of open file descriptors.
process_resident_memory_bytesResident memory size in bytes.
process_start_time_secondsStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesVirtual memory size in bytes.
process_virtual_memory_max_bytesMaximum amount of virtual memory available in bytes.

Wireguard Metrics

Felix also exports wireguard device stats if found/detected. Can be disabled via Felix configuration.

wireguard_metaGauge. Device / interface information for a felix/calico node, values are in this metric's labels
wireguard_bytes_rcvdCounter. Current bytes received from a peer identified by a peer public key and endpoint
wireguard_bytes_sentCounter. Current bytes sent to a peer identified by a peer public key and endpoint
wireguard_latest_handshake_secondsGauge. Last handshake with a peer, unix timestamp in seconds.