Flow log data types
Big picture
Calico Cloud sends the following data to Elasticsearch.
The following table details the key/value pairs in the JSON blob, including their Elasticsearch datatype.
Name | Datatype | Description |
---|---|---|
host | keyword | Name of the node that collected the flow log entry. |
start_time | date | Start time of log collection in UNIX timestamp format. |
end_time | date | End time of log collection in UNIX timestamp format. |
action | keyword | - allow : Calico Cloud accepted the flow.- deny : Calico Cloud denied the flow. |
bytes_in | long | Number of incoming bytes since the last export. |
bytes_out | long | Number of outgoing bytes since the last export. |
dest_ip | ip | IP address of the destination pod. A null value indicates aggregation. |
dest_name | keyword | Contains one of the following values: - Name of the destination pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check dest_name_aggr for more information, such as the name of the pod if it was aggregated. |
dest_name_aggr | keyword | Contains one of the following values: - Aggregated name of the destination pod. - pvt : endpoint is not a pod. Its IP address belongs to a private subnet. - pub : endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet. |
dest_namespace | keyword | Namespace of the destination endpoint. A - means the endpoint is not namespaced. |
dest_port | long | Destination port. Not applicable for ICMP packets. |
dest_service_name | keyword | Name of the destination service. A - means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP). |
dest_service_namespace | keyword | Namespace of the destination service. A - means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP). |
dest_service_port | keyword | Port name of the destination service. A - means :- the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP), or - the destination port is aggregated. A * means there are multiple service port names matching the destination port number. |
dest_type | keyword | Destination endpoint type. Possible values: - wep : A workload endpoint, a pod in Kubernetes.- ns : A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen.- net : A Network. The IP address did not fall into a known endpoint type. |
dest_labels | array of keywords | Labels applied to the destination pod. A hyphen indicates aggregation. |
dest_domains | array of keywords | Find all the destination domain names for use in a DNS policy by examining dest_domains . The field displays information on the top-level domains linked to the destination IP. Applies to flows reported from the source to destinations outside the cluster. If flowLogsDestDomainsByClient is disabled, having dest_domains : ["A"] doesn't guarantee that the flow corresponds to a connection with domain name A. The destination IP may also be linked to other domain names not yet captured by Calico. |
reporter | keyword | - src : flow came from the pod that initiated the connection.- dst : flow came from the pod that received the initial connection. |
num_flows | long | Number of flows aggregated into this entry during this export interval. |
num_flows_completed | long | Number of flows that were completed during the export interval. |
num_flows_started | long | Number of flows that were started during the export interval. |
num_process_names | long | Number of unique process names aggregated into this entry during this export interval. |
num_process_ids | long | Number of unique process ids aggregated into this entry during this export interval. |
num_process_args | long | Number of unique process args aggregated into this entry during this export interval. |
nat_outgoing_ports | array of ints | List of NAT outgoing ports for the packets that were Source NAT'd in the flow |
packets_in | long | Number of incoming packets since the last export. |
packets_out | long | Number of outgoing packets since the last export. |
proto | keyword | Protocol. |
policies | array of keywords | List of policies that interacted with this flow. See Format of the policies field. |
process_name | keyword | The name of the process that initiated or received the connection or connection request. This field will have the executable path if flowLogsCollectProcessPath is enabled. A "-" indicates that the process name is not logged. A "*" indicates that the per flow process limit has exceeded and the process names are now aggregated. |
process_id | keyword | The process ID of the corresponding process (indicated by the process_name field) that initiated or received the connection or connection request. A "-" indicates that the process ID is not logged. A "*" indicates that there are more than one unique process IDs for the corresponding process name. |
process_args | array of strings | The arguments with which the executable was invoked. The size of the list depends on the per flow process args limit. |
source_ip | ip | IP address of the source pod. A null value indicates aggregation. |
source_name | keyword | Contains one of the following values: - Name of the source pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check source_name_aggr for more information, such as the name of the pod if it was aggregated. |
source_name_aggr | keyword | Contains one of the following values: - Aggregated name of the source pod. - pvt : Endpoint is not a pod. Its IP address belongs to a private subnet.- pub : the endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet. |
source_namespace | keyword | Namespace of the source endpoint. A - means the endpoint is not namespaced. |
source_port | long | Source port. A null value indicates aggregation. |
source_type | keyword | The type of source endpoint. Possible values: - wep : A workload endpoint, a pod in Kubernetes.- ns : A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen.- net : A Network. The IP address did not fall into a known endpoint type. |
source_labels | array of keywords | Labels applied to the source pod. A hyphen indicates aggregation. |
original_source_ips | array of ips | List of external IP addresses collected from requests made to the cluster through an ingress resource. This field is only available if capturing external IP addresses is configured. |
num_original_source_ips | long | Number of unique external IP addresses collected from requests made to the cluster through an ingress resource. This count includes the IP addresses included in the original_source_ips field. This field is only available if capturing external IP addresses is configured. |
tcp_mean_send_congestion_window | long | Mean tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_min_send_congestion_window | long | Minimum tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_smooth_rtt | long | Mean smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_max_smooth_rtt | long | Maximum smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_min_rtt | long | Mean MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_max_min_rtt | long | Maximum MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_mss | long | Mean TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_min_mss | long | Minimum TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_total_retransmissions | long | Total retransmitted packets. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_lost_packets | long | Total lost packets. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_unrecovered_to | long | Total unrecovered timeouts. This field is only available if flowLogsEnableTcpStats is enabled |
Format of the policies field
The policies
field contains a comma-delimited list of policy rules that matched the flow. Each entry in the
list has the following format:
<index>|<tier name>|<policy name>|<action>|<rule index>
Where,
-
<index>
numbers the order in which the rules were hit, starting with0
.tipSort the entries of the list by the
<index>
to see the order that rules were hit. The entries are displayed in random order due to the way they are stored in the datastore. -
<tier name>
is the name of the policy tier containing the policy, or__PROFILE__
for a rule derived from aProfile
resource (this is the internal datatype used to represent a Kubernetes namespace and its associated "default allow" rule). -
<policy name>
is the name of the policy/profile; its format depends on the type of policy:<tier>.<name>
for Calico CloudGlobalNetworkPolicy
.<namespace>/knp.default.<name>
for KubernetesNetworkPolicy
.<namespace>/<tier>.<name>
for Calico CloudNetworkPolicy
.
Staged policy names are prefixed with "staged:".
-
<action>
is the action performed by the rule; one ofallow
,deny
,pass
. -
<rule index>
if non-negative, is the index of the rule that was matched within the policy, starting with 0. Otherwise, a special value:-1
means the reporting endpoint was selected by the policy but no rule matched. The traffic hit the default action for the tier. In this case, the<policy name>
is selected arbitrarily from the set of policies within the tier that apply to the endpoint.-2
means "unknown". The rule index was not recorded.
Flow log example, with no aggregation
A flow log with aggregation level 0, no aggregation
, might look like:
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|default|policy-demo/default.access-nginx|allow"
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
The log shows an incoming connection reported by the destination node, allowed by a policy on port 80. The start_time
and end_time
describe the aggregation period (5 min.) During this interval, one flow ("num_flow": 1
) was recorded. At higher aggregation levels, flows from
endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a single log. In this example, the
common source endpoints that are prefixed with access-6b687c8dcb-
. Parameters like source_ip
may be dropped and set to null
, depending on
the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For more details on aggregation
levels, see configure flow log aggregation.