Flow log data types
Big picture​
Calico Cloud sends the following data to Elasticsearch.
The following table details the key/value pairs in the JSON blob, including their Elasticsearch datatype.
| Name | Datatype | Description |
|---|---|---|
host | keyword | Name of the node that collected the flow log entry. |
start_time | date | Start time of log collection in UNIX timestamp format. |
end_time | date | End time of log collection in UNIX timestamp format. |
action | keyword | - allow: Calico Cloud accepted the flow.- deny: Calico Cloud denied the flow. |
bytes_in | long | Number of incoming bytes since the last export. |
bytes_out | long | Number of outgoing bytes since the last export. |
dest_ip | ip | IP address of the destination pod. A null value indicates aggregation. |
dest_name | keyword | Contains one of the following values: - Name of the destination pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check dest_name_aggr for more information, such as the name of the pod if it was aggregated. |
dest_name_aggr | keyword | Contains one of the following values: - Aggregated name of the destination pod. - pvt: endpoint is not a pod. Its IP address belongs to a private subnet. - pub: endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet. |
dest_namespace | keyword | Namespace of the destination endpoint. A - means the endpoint is not namespaced. |
dest_port | long | Destination port. Not applicable for ICMP packets. |
dest_service_name | keyword | Name of the destination service. A - means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP). |
dest_service_namespace | keyword | Namespace of the destination service. A - means the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP). |
dest_service_port | keyword | Port name of the destination service. A - means :- the original destination did not correspond to a known Kubernetes service (e.g. a services ClusterIP), or - the destination port is aggregated. A * means there are multiple service port names matching the destination port number. |
dest_type | keyword | Destination endpoint type. Possible values: - wep: A workload endpoint, a pod in Kubernetes.- ns: A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen.- net: A Network. The IP address did not fall into a known endpoint type. |
dest_labels | array of keywords | Labels applied to the destination pod. A hyphen indicates aggregation. |
dest_domains | array of keywords | Find all the destination domain names for use in a DNS policy by examining dest_domains. The field displays information on the top-level domains linked to the destination IP. Applies to flows reported from the source to destinations outside the cluster. If flowLogsDestDomainsByClient is disabled, having dest_domains: ["A"] doesn't guarantee that the flow corresponds to a connection with domain name A. The destination IP may also be linked to other domain names not yet captured by Calico. |
reporter | keyword | - src: flow came from the pod that initiated the connection.- dst: flow came from the pod that received the initial connection.- fwd: flow was forwarded through the node without being the source or destination. |
num_flows | long | Number of flows aggregated into this entry during this export interval. |
num_flows_completed | long | Number of flows that were completed during the export interval. |
num_flows_started | long | Number of flows that were started during the export interval. |
num_process_names | long | Number of unique process names aggregated into this entry during this export interval. |
num_process_ids | long | Number of unique process ids aggregated into this entry during this export interval. |
num_process_args | long | Number of unique process args aggregated into this entry during this export interval. |
nat_outgoing_ports | array of ints | List of NAT outgoing ports for the packets that were Source NAT'd in the flow |
packets_in | long | Number of incoming packets since the last export. |
packets_out | long | Number of outgoing packets since the last export. |
proto | keyword | Protocol. |
policies | array of arrays | List of policies that interacted with this flow. See Format of the policies field. |
process_name | keyword | The name of the process that initiated or received the connection or connection request. This field will have the executable path if flowLogsCollectProcessPath is enabled. A "-" indicates that the process name is not logged. A "*" indicates that the per flow process limit has exceeded and the process names are now aggregated. |
process_id | keyword | The process ID of the corresponding process (indicated by the process_name field) that initiated or received the connection or connection request. A "-" indicates that the process ID is not logged. A "*" indicates that there are more than one unique process IDs for the corresponding process name. |
process_args | array of strings | The arguments with which the executable was invoked. The size of the list depends on the per flow process args limit. |
source_ip | ip | IP address of the source pod. A null value indicates aggregation. |
source_name | keyword | Contains one of the following values: - Name of the source pod. - Name of the pod that was aggregated or the endpoint is not a pod. Check source_name_aggr for more information, such as the name of the pod if it was aggregated. |
source_name_aggr | keyword | Contains one of the following values: - Aggregated name of the source pod. - pvt: Endpoint is not a pod. Its IP address belongs to a private subnet.- pub: the endpoint is not a pod. Its IP address does not belong to a private subnet. It is probably an endpoint on the public internet. |
source_namespace | keyword | Namespace of the source endpoint. A - means the endpoint is not namespaced. |
source_port | long | Source port. A null value indicates aggregation. |
source_type | keyword | The type of source endpoint. Possible values: - wep: A workload endpoint, a pod in Kubernetes.- ns: A Networkset. If multiple Networksets match, then the one with the longest prefix match is chosen.- net: A Network. The IP address did not fall into a known endpoint type. |
source_labels | array of keywords | Labels applied to the source pod. A hyphen indicates aggregation. |
original_source_ips | array of ips | List of external IP addresses collected from requests made to the cluster through an ingress resource. This field is only available if capturing external IP addresses is configured. |
num_original_source_ips | long | Number of unique external IP addresses collected from requests made to the cluster through an ingress resource. This count includes the IP addresses included in the original_source_ips field. This field is only available if capturing external IP addresses is configured. |
tcp_mean_send_congestion_window | long | Mean tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_min_send_congestion_window | long | Minimum tcp send congestion window size. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_smooth_rtt | long | Mean smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_max_smooth_rtt | long | Maximum smooth RTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_min_rtt | long | Mean MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_max_min_rtt | long | Maximum MinRTT in micro seconds. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_mean_mss | long | Mean TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_min_mss | long | Minimum TCP MSS. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_total_retransmissions | long | Total retransmitted packets. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_lost_packets | long | Total lost packets. This field is only available if flowLogsEnableTcpStats is enabled |
tcp_unrecovered_to | long | Total unrecovered timeouts. This field is only available if flowLogsEnableTcpStats is enabled |
Format of the policies field​
The policies field contains four sub-fields, all_policies, enforced_policies, pending_policies, and transit_policies.
| Name | Datatype | Description |
|---|---|---|
all_policies | array of keywords | A comma-delimited list of all enforced and staged network policies that applied an action to the flow up until the first enforced policy to apply a verdict. |
enforced_policies | array of keywords | A comma-delimited list of all enforced network policies that applied an action to the flow up until the first enforced policy to apply a verdict. |
pending_policies | array of keywords | A comma-delimited list of all enforced and staged network policies that applied an action to the flow up until the first enforced or staged policy that applied a verdict. |
transit_policies | array of keywords | A comma-delimited list of all host endpoint policies with applyOnForward: true that applied an action to the flow while in transit through the node (for forwarded traffic between network interfaces). Note that policies with preDNAT: true will also appear here when they apply to ingress traffic before DNAT processing. |
Each entry in the list has the following format:
<index>|<tier name>|<policy name>|<action>|<rule index>
Where,
-
<index>numbers the order in which the rules were hit, starting with0.tipSort the entries of the list by the
<index>to see the order that rules were hit. The entries are displayed in random order due to the way they are stored in the datastore. -
<tier name>is the name of the policy tier containing the policy, or__PROFILE__for a rule derived from aProfileresource (this is the internal datatype used to represent a Kubernetes namespace and its associated "default allow" rule). -
<policy name>is the name of the policy/profile; its format depends on the type of policy:<tier>.<name>for Calico CloudGlobalNetworkPolicy.<namespace>/knp.default.<name>for KubernetesNetworkPolicy.<namespace>/<tier>.<name>for Calico CloudNetworkPolicy.
Staged policy names are prefixed with "staged:".
-
<action>is the action performed by the rule; one ofallow,deny,pass. -
<rule index>if non-negative, is the index of the rule that was matched within the policy, starting with 0. Otherwise, a special value:-1means the reporting endpoint was selected by the policy but no rule matched. The traffic hit the default action for the tier. In this case, the<policy name>is selected arbitrarily from the set of policies within the tier that apply to the endpoint.-2means "unknown". The rule index was not recorded.
Flow log example, with no aggregation​
A flow log with aggregation level 0, no aggregation, might look like:
{
"start_time": 1597166083,
"end_time": 1597166383,
"source_ip": "192.168.47.9",
"source_name": "access-6b687c8dcb-zn5s2",
"source_name_aggr": "access-6b687c8dcb-*",
"source_namespace": "policy-demo",
"source_port": 42106,
"source_type": "wep",
"source_labels": {
"labels": [
"pod-template-hash=6b687c8dcb",
"app=access"
]
},
"dest_ip": "192.168.138.79",
"dest_name": "nginx-86c57db685-h6792",
"dest_name_aggr": "nginx-86c57db685-*",
"dest_namespace": "policy-demo",
"dest_port": 80,
"dest_type": "wep",
"dest_labels": {
"labels": [
"pod-template-hash=86c57db685",
"app=nginx"
]
},
"proto": "tcp",
"action": "allow",
"reporter": "dst",
"policies": {
"all_policies": [
"0|tier1|ns1/tier1.policy1|pass|3",
"1|tier2|ns2/staged:tier1.policy1|deny|1",
"2|default|policy-demo/default.access-nginx|allow|2"
],
"enforced_policies": [
"0|tier1|ns1/tier1.policy1|pass|3",
"1|default|policy-demo/default.access-nginx|allow|2"
],
"pending_policies": [
"0|tier1|ns1/tier1.policy1|pass|3",
"1|tier2|ns2/staged:tier1.policy1|deny|1",
]
},
"bytes_in": 388,
"bytes_out": 1113,
"num_flows": 1,
"num_flows_started": 1,
"num_flows_completed": 1,
"packets_in": 6,
"packets_out": 5,
"http_requests_allowed_in": 0,
"http_requests_denied_in": 0,
"original_source_ips": null,
"num_original_source_ips": 0,
"host": "bz-n8kf-kadm-node-1",
"@timestamp": 1597166383000
}
The log shows an incoming connection reported by the destination node, allowed by a policy on port 80. The start_time and end_time
describe the aggregation period (5 min.) During this interval, one flow ("num_flow": 1) was recorded. At higher aggregation levels, flows from
endpoints performing the same operation and originating from the same Deployment/ReplicaSet are grouped into a single log. In this example, the
common source endpoints that are prefixed with access-6b687c8dcb-. Parameters like source_ip may be dropped and set to null, depending on
the aggregation level. As aggregation levels increase, more flows will be grouped together based on your data. For more details on aggregation
levels, see configure flow log aggregation.